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ABSTRACT 


ON-LINE DIAGNOSIS OF SEQUENTIAL SYSTEMS 

by 

Robert Joseph Sundstrom ' 


In many applications, especially those in which a computer is 
being used to control some process in real-time (e. g., telephone 
switching, flight control of an aircraft or spacecraft, et^) it is 
desirable to constantly monitor the performance of the system, as 
it is being used, to determine whether the actual system is within 
tolerance of the intended system. Informally, by ’’on-line diagnosis" 
we mean a monitoring process of this type. 

This study begins with the introduction of a formal model which 
can serve as the basis for a theoretical investigation of on-line diag- 
nosis. Within this model a fault of a system S is considered to be 
a transformation of S into another system S’ at some time t . 

The resulting faulty system is taken to be the system which looks 
like S up to time t and like S' thereafter. Notions of fault toler- 
ance and error are defined in terms of the resulting system being 
able to mimic some desired behavior as specified by a system S. 

A notion of on-line diagnosis is formulated which involves an external 
detector and a maximum time delay within which every error caused 
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by a fault in a prescribed set must be detected. 

This study focuses on the diagnosis of two important sets of faults 
the set of "unrestricted faults" and the set of "unrestricted component 
faults. " The set of unrestricted faults of a system is defined to be 
simply the set of all possible faults of that system. It is shown that 
if a system is on-line diagnosable for the unrestricted set of faults 
then the detector is at least as complex, in terms of state set size, 
as the specification. Moreover, this is true even if an arbitrarily 
large delay is allowed in the diagnosis. 

One means of diagnosing the set of unrestricted faults of a system 
is by duplication and comparison. For systems which have (delayed) 
inverses (i. e. , systems which are information lossless) a 
possible alternative is the use of a loop check. Here, it is estab- 
lished that if an inverse system is information lossless then it can 
always be used for unrestricted fault diagnosis. Although the loss- 
less condition is sufficient, it is shown further that there exist sys- 
tems for which a lossy inverse can also be used for unrestricted 
fault diagnosis. Since not every system has an inverse, let alone 
one which can be used for unrestricted fault diagnosis, it is not 
always possible to apply this technique directly. However, it is 
shown that every system has a realization to which this scheme can 
be successfully applied. 
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The on-line diagnosis of systems which are structurally decom- 
posed and represented as a network of smaller systems is also 
investigated. The fault set considered here is the set of unrestricted 
component faults; namely, the set of faults which only affect one 
component of the network. A characterization of networks which 
can be diagnosed using a combinational detector is obtained. It is 
further shown that any network can be made diagnosable in the above 
sense through the addition of one component. In addition, a lower 
bound is obtained on the complexity of any component, the addition 
of which is sufficient to make a particular network combinationally 
diagnosable. 
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CHAPTER I 


Introduction 

1. 1 Outline of the Problem 

For many applications, especially those in which a computer 
is controlling a real-time process (e. g. , telephone switching, 
flight control of an aircraft or spacecraft, control of traffic in a 
transportation system, etc. ), reliability is a major factor- in the 
design of the system. The need for high reliability arises because 
of the serious consequences errors may have in terms of danger to 
human lives, loss of costly equipment, or disruption of business or 
manufacturing operations. For example, it is economically unsound 
to shut down a steel mill for even a short time in order to repair 
a comparatively inexpensive controlling computer. The seriousness 
of the consequences, of course, depends upon the application and must 
be weighed against the cost of improving the reliability. 

A number of techniques exist for improving computer reliability. 
One of the more obvious is the use of more reliable components. 
While the use of reliable components is clearly very important, it 
has been recognized that this technique alone is not sufficient to meet 
the requirements for modern ultrareliable computing systems [35]. 
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Another' general technique which is useful in some applications 
is the use of masking redundancy such as Triple Modular Redundancy. 
The reader is referred to Short [ 35 ] for a general survey of masking 
techniques. One major drawback to masking redundancy is that if 
failed components are not replaced and the mission time is long, 
then the reliability of a system which uses masking redundancy can 
actually be less than that of the corresponding simplex system [25]. 

A third means of increasing system reliability and availability 
is through fault diagnosis and subsequent system reconfiguration or 
repair. For example, a computer designed to control telephone 
switching, the No. 1 Electronic Switching System (ESS) contains 
duplicates of each module and fault diagnosis is achieved primarily 
by dynamically comparing the outputs of both modules [ 11] . Once 
a fault is detected, the faulty module is identified and removed from 
service under program control. The faulty module is then repaired 
manually with diagnostic help from the fault -free computer. Another 
ultra -re liable computer, the Jet Propulsion Laboratory Self -Testing 
and Repairing (STAR) computer, also makes use of modularity and 
standby sparing [4]. 

One means of performing fault diagnosis is to continuously moni- 
tor. the performance of the system, as it is being used, to determine 
whether its actual behavior is tolerably close to the intended behavior. 
It is this sort of monitoring which we mean by the term "on-line diag- 
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nosis. " Others have used the term ’’error detection" to refer to 

this sort of monitoring ([ 22] , [ 23] ). 

Implementation of on-line diagnosis may be external to the 
system, both internal and external, or completely internal. In the 
last extreme, on-line diagnosis is sometimes referred to as self- 

diagnosis" or "self -checking" ([8], [9]). 

The signals generated by a monitoring device can be used in many 
ways. For example, the IBM System/360 utilizes checking circuits 
to detect errors [6]. The signals generated by these circuits are 
used in some models to freeze the computer so that the instruction 
which was currently executing may be retried if possible, and to 
assist in the checkout and repair of the computer if automatic retry 
attempt fails. Ultra -re liable computers typically use the signals 
generated by the monitoring device to provide the computer system 
with the information it needs to automatically reconfigure itself so 
as to avoid using any faulty circuits. One other use for such signals 
is to simply inform the system user that the system is not operating 
properly and that there may be errors in his data. 

In general, on-line diagnosis is used to signal that the system 
is operating properly or that it is in need of repair. In most computer 
systems this task is also performed in some part by "off-line 
diagnosis. ” By off-line diagnosis we are referring to the process of 
removing the system from its normal operation and applying a series 
of prearranged tests to determine whether any faults are present in 
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the system. There are major differences between on-line and off- 
line diagnosis and it is important to be aware of the capabilities and 
the limitations of each. 

One basic difference is that on-line diagnosis is a continuous 
process whereas off-line diagnosis has a periodic nature. Transient 
faults are difficult to diagnose with off-line diagnosis because if a 
fault is transient in nature it may not be in the system when it is 
tested. On the other hand, since on-line diagnosis is a continuous 
monitoring process both permanent and transient faults can be diag- 
nosed. It has been recognized by Ball and Hardie [ 5] and others that 
intermittents do occur frequently, and that finding an orderly means 
to diagnose them is an important unsolved problem. Thus the inability 
of off-line diagnosis to deal satisfactorily with transients is a severe 
limitation. 

Another basic difference is that the delay between the occurrence 
of a fault and its subsequent detection is generally greater for off- 
line than on-line diagnosis. Recovery after a fault has been diagnosed 
may sometimes be achieved by reconfiguration and restarting. How- 
ever, in a real-time application irrepeatable or nonreversable events 
may take place if an error occurs and is not immediately detected. 

In any application, if there is a delay between the occurrence of an 
error and the subsequent diagnosis of a fault, then contamination of 
data bases may occur thus making restarting difficult. For these 
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reasons, the-inherent delay associated with off-line diagnosis can be 
a serious limitation. 

One further difference between on-line and off-line diagnosis is 
that with off-line diagnosis the system must be removed from its 
normal operation to apply the tests. This also may not be acceptable 
in a real-time application. 

The cost of either form of diagnosis depends on the nature of 
the system to be diagnosed, the technology to be used in building the 
system, and the degree of protection against faulty operation that 
is required. With on-line diagnosis the cost is almost totally in 
the design, construction, and maintenance of extra hardware. With 
off-line diagnosis the cost is the initial generation of the tests and 
in the subsequent storage and running of these tests. 

In general, off-line diagnosis is useful for factory testing and 
for applications where immediate knowledge of any faulty behavior 
is not essential. Off-line diagnosis is also useful for locating the 
source of trouble once such trouble is indicated by on-line diagnosis. 
For example, as stated earlier Bell System's No. 1 ESS uses dupli- 
cation and comparison as its primary error detection scheme. But 
once an error has been detected, off-line diagnosis is used to deter- 
mine which processer exhibited the erroneous behavior and to locate 
the faulty module in that processer. 
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In the Design Techniques for Modular Architecture for Reliable 
Computing Systems (MARCS) study a more integrated use of on-line 
diagnosis is proposed whereby a number of checking circuits observe 
the performance of various parts of the computer [8], With a 
scheme such as this, information about the location of a fault can 

j. 

be obtained from knowledge of which checking circuit indicated the 
trouble. 

Both on-line and off-line diagnosis have been used to check the 
operation of electronic computers from the first vacuum tube 
machines until the present time. In particular, off-line diagnosis 
procedures were developed for the ENIAC computer, the BINAC 
system had duplicate processors, and the UNIVAC used a more 
economical on-line diagnosis scheme involving 35 checking circuits 
[12]. During the past decade, however, the development of theory 
and techniques for fault diagnosis in digital systems and circuits 
have focused mainly on problems of off-line diagnosis (see [9] and 
[ 14] for example). 

An alternative means of performing diagnosis has been investi- 
gated by White [37]. His novel scheme is similar to on-line 
diagnosis in that it involves redundant processing of information and 
subsequent checking for consistency. However, with his scheme 
the redundancy is in time rather than in space. After every opera- 
tion is performed, a related operation is initiated which uses the 



7 


same circuitry but with different signals. The results of these two 
operations are then checked for consistency. 

This scheme is useful for checking machines which were not 
designed with the additional circuitry required for on-line diagnosis. 
However, this technique is likely to be very expensive, in terms of 
both operating speed and microprogram memory requirements. In 
an example implemented by White, a self -checking microprogram to 
emulate the PDP-8/I on the Meta 4 ran an estimated 3. 9 times slower 
than a non -checking version of this microprogram and used 5 times 
as much microprogram memory. 

One other approach to diagnosis is simply to have human users 
or observers of the system watch for obvious misbehavior. Since 
faults often give rise to behaviors which are clearly erroneous, many 
faults can be detected in this manner. The effectiveness of this method 
is highly dependent upon the individual system and program, and is 
exceedingly difficult to evaluate. It seems reasonable to assume, 
however, that this method is less effective than any of the methods 
previously discussed. Certainly, this method is unacceptable for 
many applications. 
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1. 2 Brief Survey of the Literature 

The work that has been done on on-line diagnosis has been 
mainly concerned with the development of specific diagnosis techniques. 
One early paper is Kautz s study [ 19 ] of fault detection techniques 
for combinational circuits. In this paper he investigated a number 
of techniques including the use of codes and the possibility of greater 
economy if immediate detection of errors was not necessary. Some 
of the more common on-line diagnosis techniques are discussed in a 
book by Sellers, Hsiao, and Bearnson [34]. Much of what is in 
this book and a large portion of the techniques that can be found 
elsewhere in the literature are concerned with special circuits such 
as adders and counters. For example, see the work of Avizienis 
[3], Rao [ 33 ] , Dorr [ 10] , and Wadia [ 36j . 

Relatively little work can be found on the theory of on-line 
diagnosis. As with the investigation of on-line diagnosis techniques, 
much of the theory of on-line diagnosis focuses on arithmetic units. 

In one of the earliest works of a theoretical nature, Peterson [30] 
showed that an adder can be checked using a completely independent 
circuit which adds the residue, modulo some base, of the operands. 

He went on to show that any independent check of this type was a 
residue class check. Further theoretical work concerning the diag- 
nosis of arithmetic units using residue codes can be found in Massey 
[24] and Peterson [32], 
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An early theoretical result of a more general nature was published 
by Peterson and Rabin [3i J. They showed that combinational circuits 
can differ greatly in their inherent diagnosability and that in some 
cases virtual duplication is necessary. 

A later and very important paper is that of Carter and Schneider 
[ 7 j . They propose a model for on-line diagnosis which involves a 
system and external checker. The input and ouput alphabets of 
the system are encoded and the checker detects faults by indicating 
the appearance of a non-code output. A system is self -checking 
if for every fault in some prescribed set, (i) the system produces 
a non -code output for at least one code space input, and (ii) the 
system never produces incorrect code space outputs for code space 
inputs. Thus, (i) insures that every fault can be detected during normal 
usage, and (ii) insures that if no fault has been detected then the output 
canbe reliedupontobe correct. The checkers that they consider are 
also self -che eking. Using this mode l they prove that any system canbe 
designedtobe self -checking for the set of single faults. 

Anderson [ 1 ] has named property (i) T 'se If -testing” and property 
(ii) ’’fault -secure, ” and he has investigated these properties for 
combinational networks. In Chapter III it is shown that the notion 
of diagnosis considered in this study is a generalization of the fault - 
secure property. 


/ 
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1. 3 Synopsis of the Report 

This report describes a formal investigation of the theory and 
techniques applicable to the on-line diagnosis of sequential systems. 
The formal approach taken in this report leads to a fuller under- 
standing of current on-line diagnosis practices and suggests general- 
izations of known techniques. It also provides a framework for 
evaluating the advantages and limitations of the various on-line 
diagnosis schemes. 

With decreasing cost of logic and the increasing use of computers 
in real-time applications where erroneous operation can result in 
the loss of human life and/or large sums of money the use of on-line 
diagnosis can be expected to increase greatly in the near future. The 
importance of this area along with the relative lack of theoretical 
results is our motivation for initiating this study of on-line diagnosis. 

Before entering into the actual synopsis it is appropriate to dis- 
cuss the objectives of this investigation. Let S be a system which 
serves as a specification of some desired behavior, let F be a set 
of faults, let 3) be a set of possible external detectors, and let k 
be a maximum time delay within which every error caused by a fault 
in F must be detected. The basic on-line diagnosis problem can now 
be stated as follows: 

Given S, F, 3) , and k find an (economical) realization S of S 
and a detector D e 3 such that D can observe S and signal 
within k time steps any error caused by a fault in F. 
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Towards the end of solving this basic problem the following 
questions have been formulated. These questions serve as more 
specific objections and their answers will help to solve the basic 
on-line diagnosis problem. 

I. What are good on-line diagnosis techniques? That is, what 
good means are available for finding appropriate realizations and 
detectors? When is each technique applicable? 

II. Given S, S, F, Q) , and k, does a suitable detector exist in 
<3) ? That is, when is a given realization diagnosable? If such a 
detector exists how can it be constructed? A solution to this problem 
would certainly help to solve the previous one. 

III. What time -space tradeoffs are possible between the added 
complexity needed for diagnosis and the maximum allowable delay? 
We expect that there will be situations where if the detector is given 
additional time in which to indicate an error then diagnosis may be 
simplified. 

IV. What relationships exist between faults and errors? Given 
S and F, what errors are possible ? Given S and F, how can one find 
a realization S of S such that the system with faults (S, F) gives rise 
only to errors of a given type? These are important questions 
because given a diagnosis technique or a particular type of detector, 
it will often be easy to determine just what types of errors are 
detectable. The faults that are diagnosable will then have to be 
inferred from this information. Conversely, we will want to find 
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realizations such that the faults we are concerned with will cause 
errors that we can detect. 

V. What properties of system structure and system behavior 
are conducive to on-line diagnosability? Structural and behavioral 
properties are important for it is expected that they will relate 
directly to diagnosis techniques. Behavioral properties could be 
used to measure the inherent diagnosability of a given behavior in 
terms of the minimum added complexity which would be required to 
obtain a given level of on-line diagnosis. 

The first problem considered in this investigation was the formu- 
lation of a formal model which could serve as a basis for a theoretical 
study of on-line diagnosis. This model is developed fully in Chapter 
II. first an appropriate class of system models is formulated 
which can represent both the behavior and the structure of fault -free 
and faulty systems. Then notions of realization, fault, fault -tolerance 
and diagnosability are formalized which have meaningful interpreta- 
tions in the context of on-line diagnosis. The following chapters are 
all concerned with the properties of the notion of diagnosis which is 
introduced in this chapter. 

Chapter III contains some elementary properties of diagnosis 
which are independent of the particular class of faults under considera- 
tion. The result of this chapter help to give a basic understanding 
of on-line diagnosis and are used in the later chapters. 
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Chapter TV is concerned with the diagnosis of the set of 
unrestricted faults. This set of faults is simply the set of all possible 
faults of the system under consideration. The major result of this 
chapter gives a lower bound on the complexity of any detector 
which can be used for unrestricted fault diagnosis of a given 
system. 

In Chapter V, the use of inverse systems for the diagnosis of 
unrestricted faults is considered. Inverse systems are formally 
introduced, and a partial characterization of those inverse systems 
which can be used for unrestricted fault diagnosis is obtained. Since 
not every system has an inverse system, let alone one which is 
suitable for unrestricted fault diagnosis, it is not always possible 
to apply this technique directly. However, it is shown that every 
system has a realization upon which this technique can be success- 
fully applied. 

In Chapter VI, the diagnosis of systems which are structurally 
decomposed and are represented as a network of smaller systems 
is studied. The fault set considered here is the set of faults which 
only affect one component system in the network. A characterization 
of those networks which can be diagnosed using a purely combinational 
detector is achieved. A technique is given which can be used to realize 
any network by a network which is diagnosable in the above sense. 
Limits are found on the amount of redundancy involved in any such 
technique. 



CHAPTER II 


A Model for the Study of On-Line Diagnosis 

In this chapter a formal model is developed which is suitable 
for a theoretical study of on-line diagnosis of sequential systems. 
The development begins with the introduction of a class of 
system models, called "resettable discrete -time systems, " which 
will serve as the basis of this study. Within this model a fault of 
a system S is considered to be a transformation of S into another 
system S’ at some time r. The resulting faulty system is taken to 
be the system which looks like S up to time r and like S T thereafter. 
Next the companion notions of fault tolerance and error are defined 
in terms of the resulting system being able to mimic some desired 
behavior. Finally, a notion of on-line diagnosis is introduced. 

This notion involves an external detector and a maximum time 
delay within which every error caused by a fault in some prescribed 
set must be detected. 
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2.1 Resettable Discrete -Time Systems 

On-line diagnosis is inherently a more complex process than off- 
line diagnosis because of two complicating factors: i) it has to deal with 
input over which it has no control and ii) faults can occur as the system 
is being diagnosed. We would like to build a theory of on-line diagnosis 
using conventional models of time -invariant (stationary, fixed) systems 
(e. g. , sequential machines, sequential networks, etc. ). However, 
due to the second factor mentioned above these conventional models 
can no longer be used to represent the dynamics of the system as it is 
being diagnosed. A system which is designed and built to behave in a 
time -invariant manner becomes a time -varying system as faults occur 
while it is in use. Therefore, a more general representation based 
on time -varying systems is required. Based on this fundamental obser- 
vation we have developed what we believe to be an appropriate model 
for the study of on-line diagnosis. 

Definition 2. 1 : Relative to the time -base T={ — , -1, 0, 1. . . a 
discrete -time system (with finite input and output alphabets) is a system 

S = (I,Q,Z,S,A) 

where I is a finite nonempty set, the input alphabet 
Q is a nonempty set, the state set 
Z is a finite nonempty set, the output alphabet 


5 : Q X I x T — > Q, the transition function 
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A: ‘QXIXT-^Z, the output function. 

The interpretation of a discrete-time system is a system which, 
if at time t is in state q and receives input a, will at time t emit out- 
put symbol A(q, a, t) and at time t + 1 be in state <5{q,a, t). In the special 
case where the functions 6 and A are independent of time (i. e. , are 
time -invariant), the definition reduces to that of a (Mealy) sequential 
machine. In the discussion that follows it is assumed that S is 
finite -state (i. e. , j Q j < oo). 

To describe the behavior of a system, we first extend the transi- 
tion and output functions to input sequences in the following natural way. 
If I* is the set of all finite -length sequences over I (including the null 
sequence A) then: 

6: QxI*xt^Q 
where, for all q e Q, a € I, t e T: 

S"(q>A, t) = q 

6(q,a, t) = 6(q, a, t) 

® (tl » 3. • • a^, t) — 6(6(q, a^ag. • . a^ t ), a^, t + n - 1) . 

Similarly, if I + = I*-{a}: 

A: Q x I + x T Z 
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where for all q e Q, a e I, t e T: 

A(q, a, t) = A(q, a, t) 

A(q, — A(s"(q, 3.^3.^, . . a^ t)> t + n - 1) . 

Henceforth 6" and X will be denoted simply as 6 and X. 

Relative to these extended functions, the behavior of S in state q 
is the function 

/3 : I + X T — > Z 

q 

where 

0 (x, t) = X(q,x, t) . 

4 

Thus, if the state of the system is q and it receives input sequence x 
starting at time t, then /3 (x,t) is the output emitted when the last 

q 

symbol in x is received, i. e. , the output at time t + [x| - 1 (|x [ = 
length (x)) . 

Many investigations of on-line diagnosis and fault tolerance have 
studied redundancy schemes such as duplication and triplication. 
Typically they have not dealt with the problem of starting each copy of 
a machine in the same state. In this study we will be examining these 
schemes and others for which the same problem arises. Since many 
existing systems have reset capabilities, and since this feature solves 
the above synchronizing problem we will use a special type of system 
for which the reset capabilities are explicitly specified. This explicit 
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specification of the reset capability is essential since it is an important 
part of the total system and it may be subject to failure. 

Definition 2. 2 : A resettable discrete -time system ( resettable system ) 

is a system 

S = (I> Q> Z, <5, A, R, p) 

where (I, Q, Z,6, A) is a discrete-time system 

R is a finite nonempty set, the reset alphabet 
p: RXT->Q, the reset function . 

A resettable system is resettable in the sense that if reset r is 
applied at time t - 1 then p(r,t) is the state at time t. This method of 
specifying reset capability is a matter of convenience. This feature 
could just as well have been incorporated as a restriction on the transi- 
tion function relative to a distinguished subset of input symbols called 
the reset alphabet. Thus a resettable discrete -time system can indeed 
be regarded as a special type of discrete -time system. If 6, A, and p 
are all independent of time the definition reduces to that of a resettable 
sequential machine. Thus a resettable machine can be yiewed as a 
resettable system which is invariant under time -translations. 

Given a resettable system we can view it as a system organized 
as in Fig. 2. 1. 
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Fig. 2. 1. Schematic Diagram for S = (I, Q, Z, 5, X, R,p ) 


In many discussions the output function of a system will not 
be of direct concern? the focus of attention will be upon the state 
transitions. This motivates the following definition. 


Definition 2. 3 : A resettable discrete -time system S = (I, Q, Z,S,X, R,p) 

is a resettable state system if Z = Q and X(q, a, t) = q for all q € Q, 
a e I, and t e T. 

Since the output alphabet and output function of a resettable state 
system need not be explicitly specified, a resettable state system 
S = (I, Q, Z, 5, X, R,p) will be denoted by the 5-tuple (I, Q, 6,R,p). 

This formulation of resettable state systems as special types of 
resettable systems allows us to directly apply the following theory of 
on-line diagnosis to state machines. 
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Notation: Resettable systems will be denoted by S, S’, S^, Sg, etc., 

and resettable machines will be denoted by M, M\ M^, M^, etc. 
Unless otherwise specified, M will denote the resettable machine 
(I, Q, Z, S, A, R,p); M* will denote the resettable machine (I\Q\ Z', 6\ 
A', R \p'); and so forth. <?(I, Z,R) will denote the set of systems with 
input alphabet I, output alphabet Z, and reset alphabet R. That is, 

c?(I,Z,R) = {S'|S T =(I,Q',Z,6’,A',R,p’)} . 

W, Z, R) will denote the corresponding set of resettable machines. 

Definition 2. 4 : A resettable sequential machine M = (I, Q, Z, 6, A, R,p) 

is memoryless or combinational if jQ j = 1. 

The triple (I, Z, A) where A: I — > Z will be used to denote any 
memoryless machine with input alphabet I, output alphabet Z, and 
output function A. The memoryless machine M = (I, Z, A) is said to 
realize the function A from I into Z. 

We will represent sequential machines in the usual manner, 
i. e. , via transition tables or state graphs. Resettable machines are 
represented by minor extensions of these two methods. The transition 
table of a resettable machine is identical to that of a machine with 
addition of one column on the right to accommodate the reset function. 
If p(r) = q then r will appear in this additional column in the row 
corresponding to state q. Similarly, the state graph of a resettable 
machine is identical to that of a machine with the addition of one short 
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arrow for each r e R. This arrow will be labeled r and will point 
to state p(r). 

Example 2. 1 : Let be the sequence generator with reset alphabet 

{0} and input alphabet {l} which has been implemented by the circuit 
in Fig. 2. 2. 



Fig. 2. 2. Circuit for 


The transition table and the state graph for are shown in 


Figs. 2. 3 and 2. 4. 


V'S 
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00 

01/0 
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01 

11/1 


10 

00/1 


11 

10/1 



Fig. 2. 3. Transition Table for 
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0 



Fig. 2. 4. State Graph for 

The circuit in Fig. 2. 2 is also an implementation of a similar machine 
Mg with input alphabet {0, l}. The state graph for Mg is shown in 
Fig. 2. 5. 

0 



Thus, in Mg the input symbol "0 n can be interpreted as an input or as 
a reset. In Mg the outputs for input 0 are explicitly specified whereas 
in M x they may be regarded as classical "don’t cares. " 
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We can view a particular discrete -time system as a system which 
looks like some machine M t in one time interval, like M i+1 in another 
interval, and so on. This is also a good means of specifying a system. 

! 1 

1 I 

i i 

r 1 1 

I 
i 
j 


M. 


i+2 


M i+ i 


M. 

l 


Time »> 

Fig. 2.6. A Discrete -Time System 

Example 2.2 : Suppose that was implemented as in Fig. 2. 2 and 

that this circuit operated correctly up to time 100 when gate 2 became 
stuck -at -0. What actually existed was not a resettable machine but a 
(time -varying) resettable system S which looks like up to time 100 
and like a different machine, say thereafter. The graph for M[ is 
shown in Fig. 2. 7. 



Fig. 2. 7. Resettable Machine M’^ 
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We can represent S as follows: 

f M, for t < 100 

- 

L M r j for t > 100. 

By this we mean that I = 1^ = I'j and likewise for Q, Z, and R, and that 


i (n o f \ 
\4J W 


and similarly for A and p. 


f Sj^a) for t < 100 
L S^q, a) for t > 100 


For resettable systems we take the definitions of 6, X, and (3 
to be the same as those for systems. It is also convenient in the case 
of resettable systems to specify behavior relative to a reset input r 
that is released at time t, that is, the behavior of S for condition (r, t) 
(r e R, t e T) is the function 

0 r ,t :I+ -^z 

where 

fl r,t W = ' 5 p(r,t) (x ' t) ' 

If t = 0, 0 is referred to as the behavior of S for initial reset r 

and is denoted simply as 0 . 
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It is useful to extend the behavior function /3 , in a natural 

r, t 

manner to represent the sequence to sequence behavior of S. For 
r e R and t e T 


A 

'/3 


r,f 


r 


+ 


where for all a,. . . a el 
1 n 


^r, t (a l- • • V = ' 5 r,t (a l ) "- /;i r,t (a l a 2-" a n ) 


We will now introduce a few properties of resettable machines 
which will be important to our developing model of on-line diagnosis. 

A more complete treatment of the properties of resettable machines 
can be found in the appendix. 

These properties are defined for resettable machines rather 
than for resettable systems because they will be applied to "fault -free" 
systems, which in this study are always time- invariant. 

We begin with some concepts of "reachability. " Let M be a 
resettable machine. The reachable part of M , denoted by P, is the 
set 

P = {<5(p(r), x)|r e R, x e I*} . 

M is reachable if P = Q. M is i -reachable if 


P = {S(p(r),x) jr e R, x € I* and |x j < l } . 


Note that a machine can be i -reachable but not reachable. 
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An elementary result of graph theory states that in a directed 
graph with n points, if a point v can be reached from a point u then 
there is a path of length n - 1 or less from u to v. An immediate con- 
sequence of this is that any machine M is (Jp J - l)-reachable. 

Let M, M’ e 3H(I, Z,R). M is equivalent to M' (written M= M') 
if 0 r = $' T for all r e R. Two states q e Q and q* e Q' are equivalent 
(4 - q’) if , • It is easily verified that these are both equivalence 

relations, the first onI(I, Z,R) and the second on the states of machines 
in/fR(I, Z,R). 

A resettable machine M is reduced if for all q, q* e P, q = q’ 
implies q = q’. A basic result of sequential machine theory states that 
for every machine there is an equivalent reduced machine and that this 
machine is unique up to isomorphism. The corresponding result for 
resettable machines is given in the appendix. 

A concept which is central to sequential machine theory is that of 
a "realization. " The corresponding resettable machine concept will 
be very important to our theory of on-line diagnosis. We will intro- 
duce it by first stating Meyer and Zeigler's definition of realization for 
sequential machines [27]. 

Definition 2. 5 : If M and M are sequential machines then M realizes 
M if there is a triple of functions (o^, where cr^: (I ) + I + is . 

a semigroup homomorphism such that 0^(1) c I, or : Q ->Q, 

Og: Z* Z where Z' c z, such that for all q e Q 



It has been shown by Leake [23] that this strictly behavioral 
definition of realization is equivalent to the structurally oriented 
definition of Hartmanis and Stearns [16]. 

If M and M are resettable machines then our definition of 
realization is somewhat different. Inherent in this definition is our 
presupposition that a resettable system will be reset before every use. 

Definition 2. 6 : If M and M are two resettable machines then M realizes 
M if there is a triple of functions (cr^ Og) where Oy (I) + I + is 
a semigroup homomorphism such that (7.(1) c I, a : R R, a : 

Z' c z, such that for all r e R , 

= a 0 3 r\ 0 ^ 
r 3 o-g (r) 1 

This concept can be viewed pictorally as in Fig. 2. 8. 


R 



Fig. 2.8. M Realizes M under (o^, o^, °g) 
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Example 2. 3 : • Let M g and be the resettable machines shown in 
Fig. 2. 9 and Fig. 2. 10. 



i 

Fig. 2. 9. Resettable Machine 



Fig. 2. 10. Resettable Machine 
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Then realizes Mg under the triple (a^, o^, <7g) where (Ig) -> Ij 

is the identity, oy Rg -?> Rg is defined by ^(r) = r^, and 

ag.‘ Zg -> Z 3 is the identity. To verify this claim we need only 

observe that j3^(x) = (x) for allx € (Io) + . 

r r x J 

Notice that the definition of realization for resettable machines 
is less restrictive than that for sequential machines in the sense that 
for resettable machines we only require the realizing system to 
mimic the behavior of the reset states of the realized machine; while 
in the sequential machine case the realizing system must mimic the be- 
havior of every state of the realized system. On the other hand, the 
definition in the resettable case is more restrictive in the sense that 
for each reset state in the realized machine not only does there exist 
a state in the realizing machine which mimics its behavior, but we also 
know how to get to that state. 

Before proceeding with our model of on-line diagnosis we must 
introduce a few notational conventions. The identity function on a 
set A will be denoted by e^. When it is clearly understood which 
set is being mapped the subscript will be deleted. 

If Aj, . . . , A is a sequence of n sets, its cartesian product is 
the set A^ x . . . x A.^ = {(x^, . . . ,x n ) |x. e A^, i = 1 , . . . , n}. 

The cartesian product of an empty sequence of sets is taken to be any 
singleton set. 
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Given a cartesian product A = Xj A., a coordinate projection of A 

is a function P. : A -> defined by P. (x^, . . . ,x r ) = x.. 

If f ^ : A-> B^, . . . , f^: A B n is a sequence of functions, the 

n n 

cross-product function .X, f . : A -> < B. is defined bv 

■ 1=1 1 1=1 1 J 

n 

x=lM a ^ ~ * • • 9 f (a)). The cross-product function can be used 

to extend coordinate projections to project on to any subset of coordin- 
ates: if C c {l, . . . , n} then P^,: A -> A^ is defined by 
p c = iJ C P i* In P articular is a constant function with domain A. 



31 


2. 2 Resettable Systems with Faults 

Our model of a "resettable system with faults" is a specialization 
of Meyer’s general model of a "system with faults" [29]. 


Informally, a "system with faults" is a system, along with 
a set of potential faults of the system and description of what 
happens to the original system as the result of each fault. 

The original system and the systems resulting from faults 
, are members of one of two prescribed classes of (formal) 

systems, a "specification" class for the original system and 
a "realization" class for the resulting systems. More pre- 
cisely, we say that a triple (<$, (R,p) is a (system) representa- 
tion scheme if 

i) cS 1 is a class of systems, the specification class, 
ii ) (R is a class of systems, the realization class, 
in) p: <R — > £ where, if R € (R, R realizes p(R). 

By a class of systems, in this context, we mean a class of 
formal systems, i. e. , a set of formally specified structures 
of the same type, each having an associated behavior that is 
determined by the structure [29]. 

In this study we are concerned with the reliable use of a system. 
That is, we are concerned with degradations in structure which Meyer 
calls "life defects. ” This is contrasted with reliable design in which 
case we would be concerned with "birth defects. " Thus, in our case, 
a specification is a realization and we choose a representation scheme 
(R = (<ft, <R,p) where p is the identity function on (R. 

Assuming that a faulty resettable system has the same input, 
output, and reset alphabets as the fault -free system S, the following 
class of resettable systems will suffice as a realization class: 


<?(I,Z,R) = {S’|S’ = (I,Q’,Z,6 ’, V,R,p’)} . 



32 


In summary, the representation scheme that we are choosing for 
our study of on-line diagnosis is the scheme ( (ft, (ft,p) where 
(ft = c?(I, Z, R) and p is the identity function on (ft. 

In such a scheme the seemingly difficult problem of describing 
faults and their results becomes relatively straightforward. Before 
we state our particular notion of a fault and its results we will repeat 
here Meyer's general notion of a ’’system with faults” [29 ]. 

A system with faults in a representation scheme 
(c?, Jl,p) is a structure (S, F,0) where 

i) S e c? 

ii) F is a set, the faults of S 

iii) 0: F — > (ft such that, for some f e F, 
p(0(f))=S. 

If f e F, the system S f = 0(f) is the result of f. If p(S f ) = S 
then f is improper (by iii), F contains at least one improper 
fault); otherwise it is proper. __ A realization Sf is fault -free 
if f is improper; otherwise S f is faulty [29]. ~ 

In applying this notion to our study we must first define what we 

mean by a fault of a resettable system. Given a resettable system 

S e e?(I, Z, R), a fault f of S can be regarded as a transformation of 

S into another system S’ e «?(I, Z, R) at some time r. Accordingly, 

the resulting faulty system looks like S up to time r and like S’ 

thereafter. Since S may be in operation at time r we must also be 

concerned with the question of what happens to the state of S as this 

transformation takes place. We handle this with a function 9 from 

the state set of S to that of S’. The interpretation of 6 is that if S is 

in state q immediately before time r then S’ is in state 0(q) at time 

r. More precisely, 
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Definition 2. 7 ; If S e «5*(I, Z, R), a fault of S is a triple 

f = . (S', r, 9) 

where S’ e «?(I, Z, R), r e T, and 9 : Q — > Q'. 

A fault f = (S', r, 9) of S is a permanent fault if S T is time invariant. 
We view the occurrence of a fault f = (S’, r, 6) of a system S as 
shown in Fig. 2. 11. 



Fig. 2. 11. A Fault f = (S’,t, 6) of S 


Given this formal representation of a fault of S, the resulting 
faulty system is defined as follows. 

Definition 2. 8 : The result of f = (S', t, 0) is the system 

S f = (I,Q f , Z, 5 f ,A f ,R,p f ) 

where Q f = Q U Q 1 

r 6(q, a, t) if q e Q and t < r - 1 
/ 0(6(q, a, t) ) if q e Q and t = t - 1 
I 6'(q, a, t) if q e Q 1 and t > r 


6 f (q,a,t) 
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f *(q, a, t) if q e Q and t < t 
l*’(q,a, t) if q 6 Q’ and t > r 


P f (r, t) 


p(r, t) if t < r 
0(p(r,t) ) if t= r 
p'(r,t) if t >r. 


(Arguments not specified in the above definitions may be assigned arbi- 
trary values. ) 


In justifying this representation of the resulting faulty system one 
should regard a fault f = (S f ,r,0) as actually occurring between time 
7-1 and 7. Note that, for any fault f of S, S f e «S*(I, Z,R). 

Example 2. 4 : Recall that in Example 2. 2 was transformed into 

at time 100, We would say now that f = (M^, 100, e) is a permanent 
fault of M 1 and that S is the result of f (i. e. , S = M*). 

4 

Example 2.5: Again consider as implemented by the circuit in 
Fig. 2. 2 and let g be the fault which is caused by d^ becoming stuck -at -1 
at time 50. Then g = (M”, 50, 9) is a permanent fault of Mj where 
is the machine shown in Fig. 2. 12 and 9: Q x Q” is defined by the 
table 
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q 

e(q) 

00 

10 

01 

n 

10 

10 

11 

ii 


0 



Fig. 2. 12. Resettable Machine M” 

will behave as M x up to time 50 and thereafter it will produce a 
constant sequence of l f s. 

To complete the model, a resettable system with faults, in this 
representation scheme, is a structure 

(S, F, 0) 

where S e «?(I, Z, R), F is a set of faults of S including at least one 
improper fault (e. g. , f = (S, 0, e)), and <j>: F —> c?(I, Z, R)' where <p(f) = 
S f , for all f e F. Given this definition, we can drop the explicit refer 
ence to <f> in denoting a resettable system with faults, i. e. , (S, F) will 
mean (S, F,<t>) where <t> is as defined above. 
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In the remainder of this study we will be dealing exclusively with 
resettable systems. Thus we will refer to resettable systems simply 
as systems and to resettable machines as machines. 

A word is in order about our definition of faults. The interpreta- 
tion here is one of effect, not cause, e. g. , we don’t talk of stuck-at-1 
OR gates but rather of the system which is created due to some presumed 
physical cause. We will refer to these physical causes as component 
failures or simply as failures. A fault, by our definition, consists of 
precisely that information which is needed to define the system which 
results from the fault. This allows us to treat faults in the abstract; 
independent of specific network realizations of the system and without 
reference to the technology employed in this realization and the types 
of failures which are possible with this technology. We are assured, 
however, that for each fault we have enough information to assess the 
structural and behavioral effects of the fault; in particular as these 
effects relate to fault diagnosis and tolerance. 

There are limits, however, to how much can be done with a purely 
effect oriented concept of faults. When a system is sufficiently structured 
to allow a reasonable notion of what may cause a fault we certainly will 
want to make use of this notion. When this is the case we may, through 
an abuse in language, refer to a specific failure at time r as a fault. 

What we will mean is that we have stated a cause of fault and that there 
is a unique fault which is the result of this failure at time t. 
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It is interesting to see what the scope of our definition of fault is 
in terms of the types of failures which will result in faults. Recall that 
a fault f of a system S is a triple, f = (S’, r, 6 ), where S’ € cS*(I, Z, R). 

Thus S' is a (resettable) system with the same input, output, and reset 
alphabets as S. The previous sentence contains, implicitly, every 
restriction that we have put on faults. First of all, S’ is a (resettable) 
system. Thus it remains within our universe of discourse. In parti- 
cular, its reset inputs still act like reset inputs. That is, they cause 
S’ to go into a particular state regardless of the state it was in when the 
reset input was applied. The restrictions on the input, output, and re- 
set alphabets are reasonable since after -a fault occurs the system 
presumably will have the same input and output terminals as it had be- 
fore the fault occurred. 

Let f = (S’, t, 0) be a fault. Because S’ may vary with time we have 
considerable latitude in the types of failures which we may consider. 

In particular, we may consider simultaneous permanent failures in one 
or more components, simultaneous intermittent failures in one or more 
components, or any combination of the above occurring at the same or 
varying times. For example, a fault f may be caused by an AND gate 
becoming stuck -at -1 at time r^, followed by an OR gate becoming stuck - 
at-0 at time 

f 

Let us now compute the behavior of S in state q. Let x - a^. . . a^ 
€ I + . Then 
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^(x, t) = A f (q,x,t) 


^ t) t a^> t 4* n - 1) 


There are three cases which must be considered* 

Case i) q e Q and t + n - 1 < r * Then 

//(x,t) = A(d(q,a r ..a n _ 1 ,t),a n ,t + n - 1) 

= /3 q (x,t) . 

Case ii) q £ Q, t + n - 1 > r, and t < r. Say t + n -m = r. 


= ,\'(a’(9(5(q,a r ..,a n _ m ,t)),a n 


■m+1* * * a n-l’ 


t + n - m), a n , t + n - 1) 


= & 


0(6(q, a.y . . a n _ m , t))^ a n-m+l* * • \> t + n -m) 


* /3 f 


0(S(q,y,t)) 


(z,r) where y = a,. . . a 

1 n-m 


and z a + « , a 
n-m-hl n 


Case iii) q € Q 1 and t > 7 * Then 

//(x,t) = A , (6 , (q,a 1 ...a n _ 1 ,t),a n> t + n-l) 


= 0’ (x,t) . 

4 


Then 


Thus we have proved: 
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Theorem 2. 1: Let S be a system and f = (S' , t,6) a fault of S. Then for 
, each t e T and x e I + 

/3 (x, t) if q e Q and t + Jx ] < t 

^(6(q,y,t)) (z>T) “ q£ Q ’ t + W >T ’ and 

t < r where x = yz and Jy J = r - t 

£ f (x, t) if q £ Q r and t > r. 

f f 

(As in the definitions of 6 and X arguments not specified may be 
assigned arbitrary values. ) 

Corollary 2. 1. 1 : Let S be a system and f = (S\ r, 0 ) a fault of S. Then 
for each r e R, t c T, and x e I + 

/3 (x) if t + |x | < t 

r ? t 

' 3 8(6(p{r,0,y,t)) (z ’ T)lft+ l x l > Tand 
t < t where x = yz and 

|y I = t - 1 

/S' j. (x) if t > r . 

i , c 

Proof: By its definition 

M = (x,t) . 

Again we have three cases to consider. 
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♦ J* 

Case i) t+|x|<r. Then t < r and p x (r, t) = p(r, t) e Q. 
Therefore by Theorem 2. 1 


(? 


P f ( r,t) 


(x, t) = 




Case ii) t + |x | > t and t < t. if t < t thenp J (r,t) =p(r,t) e Q 
and Case ii) of Theorem 2. 1 applies withp(r, t) in place of q. If 

f 

t = r then p (r, t) = d(p{ r, t)) € Q’ and case iii) of the theorem 
applies giving us 


(x,t) 

P (r,t) 


= ^(p(r, t)) 


(x,t) 


= ^(6(p(r,t),A,t)) (x ’ t) - 

r 

Case lij) t > t. In this case p (r, t) = p’(r,t) e Q\ Therefore 


(f* (x,t) 

P*(r,t) 


^p T (r , t) 


(x,t) 



(x). 


We have noted that we will often be interested in the physical cause 
of a fault. For example, in a network realization of a machine we may 
be interested in faults which are caused by a specific NAND gate be- 
coming stuck-at-1. Since this gate failure results in different faults 
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as we consider it occurring at different times it seems natural to give 
-a name to this family of faults. More generally, we will define an equi- 
valence relation on a set of faults such that a family of faults such aB 
we have just mentioned will be an equivalence class. 

First we must define an equivalence relation on e?(I, Z,R) such 
that two systems S, S T e c?(I, Z,R) are equivalent if they are identical 
except for a shift in time. 

Definition 2. 9 : Let S, S' e eS’U, Z,R). S f is a n-translation of S if 

Q = Q T and for all q € Q, a e I, r e R, and t e T 

i) 6(q,a, t) = S'(q, a,t+n) 

ii) A(q, a, t) = A'(q, a, t+n) 

iii) p(r, t) =p'(r, t+n) . 


If S f is a n-translation of S then it can be shown that for all q e Q, 
r e R, x e I + , and t e T 


and 


0 (x,t) = P' (x, t+n) 

4 HI 




Definition 2. 10: Let (S.F) be a system with faults and let f^ = 0^) 

and fg = (S 2 , x 2 » 0g) Then ^ is equivalent to f 2 (f^ = fg) if 

is a (n^ - 02 ) -translation of Sg and 0 ^ 
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Theorem 2. 2: * The above relations are equivalence relations. 

Proof : The relation of "n-translation" is an equivalence relation on 
«S*(I, Z,R) because is an equivalence relation. The relation ,r =” on 
a set of faults of a system is an equivalence relation because "n-trans- 
lation" and ”=" are both equivalence relations. 


Notation : We denote then equivalence class of F which contains the 
fault f = (S, r, 9) by [fJ F - When the class of faults is clear we will drop 
the F. Generally if F is not mentioned we take it to be the set of all 

possible faults of a system S. We let f. = {S., i, 0) denote the fault in 

, fi 

[fj which occurs at time i. When dealing with behaviors (3 will denote 

f i i 

the behavior of S , and 13 will denote the behavior of S., 

Let f. = (S., i, 9) and f. = (S^ , j , 9) be equivalent faults of a machine 

M. Since M is a (i -j ) -translation of itself, it can be verified directly 

f ■ ■ f- 

from Definition 2. 8 that M 1 is a (i-j) -translation of M - 1 . Hence, 


Theorem 2. 3 : Let f be a fault of M and let f., f. e [f]. Then for all 

q e Q, x € I + , r e R and t e T 

f. f. 

/3 q l (x, t+i) = ^(x,t+j) 


f. f. 

0 (x) = j3 ■* . (x) . 

r,t+i r,t+] 


and 
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In this section we have defined and studied the notion of a fault 
of a system. In the remainder of this study we shall limit our investi- 
gations to the case in which the fault -free system is time -invariant. 

That is, we shall be studying faults of machines. This is not a serious 
restriction since the behavior of (fault -free) computers and related 
digital equipment does not vary with time. Nevertheless, the concepts 
developed in this and the preceding section are necessary since faulty 
machines (except in the case of improper faults) are time -varying. 
Given a fault f = (S', f , 9 ) of a machine M, S' will not be restricted 
to being time -invariant. This allows us to consider intermittant faults. 
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2. 3 Fault Tolerance and Errors 

Given a system with faults (S, F) and a proper fault f e F, an 
immediate question is whether the faulty system S* is usable in the 
sense that its behavior resembles, within acceptable limits, that of the 
fault -free system S. We w T ill use the general notion of a ’’tolerance 
relation” [2 9] to make more precise what is meant by ’’acceptable 
limits. " A tolerance relation for a representation scheme (S,(R,p) is 
a relation y between (ft and <£( y c (R x<$) such that, for all R e (ft, 
(R,p(R)) e y (i. e. , p c y) , In this section we will develop the particu- 
lar notions of "acceptable limits” that we will be using in this study of 
on-line diagnosis. 

Given a machine M it will be understood that M realizes a specific 
reduced and reachable machine M under the triple (CT^ja^jCr^). Under 
the intended interpretation, M serves as the specification of some 
desired behavior and M serves as the fault-free realization of this 
behavior. This relationship between M and M will underlie our basic 
notions of fault tolerance, error and on-line diagnosis. 

In this study we will only be concerned with the behavior of M 
under those resets and inputs which correspond via and to resets 
and inputs of M. No requirements will ever be put on /3 r (x) or 4 ,w, 
where f is a fault of M, if r / (^(R) or x i cr^(I + ) because these are 
considered to be "non-code space resets" and ’’non -code space inputs. ” 
For this reason we will always assume that cr^ and are onto . In 
actually dealing with machines for which or a ^ is not onto, occurrences 
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of "non -code space resets" and "non -code space inputs" could be 
ignored or they could be treated as errors which must be detected. 
These two options correspond to Carter and Schneider’s [ 7 J Don’t 
Care Assignments 1 and 2. 

We will be using two basic notions of fault tolerance. The first, 
and weaker, corresponds to the preservation of the behavior of M 
only insofar as its mimicing of M is concerned. 


Definition 2. 11 : Let f be a fault of a machine M. Then f is 1 -tolerated 

by M for resets at time t if for all r e R 


/3~ = CT 0 ° ]3 . =>( 7 , 

r 3 cr 2 (r), t 1 

Alternatively, since and cr, ^ are onto and since {L .= 

cr 0 a j3 ° a\ , f is 1 -tolerated by M for resets at time t if for 
3 <J 2 (r) 1 

all r e R 


a 3 ° = a 3 ° ®r, t 


In the special case where f is 1 -tolerated by M for resets at time 
0, we will simply say that f is 1 -tolerated by M a 

The second, and stronger, notion of tolerance does not allow for 
the tolerance of any change in behavior. 


Definition 2. 12: Let f be a fault of a machine M. Then f is 2 -tolerated 

by M for resets at time t if for all r e R , 

r r • i 
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Again, f is 2 -tolerated by M if it is 2-tolerated by M for resets 
at time 0. 

Our definition of 1-tolerated induces a relation^ on (ft where 

f f 

M r 1 M if and only if f is 1-tolerated by M. If f is improper then M = 

M and thus f is 1-tolerated by M. Hence M y ^ M, and therefore 7^ is 

a tolerance relation. Likewise 2-tolerated induces a tolerance relation 

7 If f is 2-tolerated by M then we can see that f is 1-tolerated by M. 

Hence, as sets, y ^ ^ Finally, note that if is 1-1 and f is 

1-tolerated by M then f is 2-tolerated by M. 

Example 2.6 : Let M be the realization of M which consists of 3 copies 

of M. a voter, and a disagreement detector as shown in Fig. 2. 13. Then 
any fault f which affects only one copy of M is 1-tolerated but may not 
be 2-tolerated, and its presence may be detected by the disagreement 


detector. 
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Fig. 2. 13. Triple Modular Redundancy with Voting 
and Disagreement Detecting 

Our definitions of 1 and 2 -tolerated by M for resets at time t 
are refined notions of fault tolerance. Coarser notions, and ones more 
in keeping with the literature, would be behavioral equivalence for 
resets at any time. We prefer our finer definitions for with them the 
effects of time can be more naturally analyzed. One question which 
we will study later is: For resets at how many (and which) times must 
a fault be tolerated for it to be tolerated for resets at any time? 

When a discussion or theorem applies equally well to 1-tolerated 
and to 2-tolerated we will just use the general term ’'tolerated. " We 
also do this latter in this section when we discuss "errors. " 






48 


It would/be convenient if, without loss of generality, it was 
possible to consider the behavior of systems only for resets released 
at time 0. The following result shows that this can be done by a simple 
change in the fault set under consideration. 

Theorem 2. 4: Let f = (S’, r, Q) be a fault of machine M. Then f is toler 
ated by M for resets at time t if and only if f is tolerated by M. 

f r f r-t ■ f T f T-f 

Proof: By Theorem 2. 3, . = /3 „ . Hence, o, ° $ \ = <r„ • p „ 

" " * * ^ r ? t r> u o r , t o 

and a 3 a = CT 3 ° and only if a 3 ® ^ = a 3 « . This 

establishes the result. 

Thus a fault f is tolerated by M for resets at any time if and 
only if the class [f] of faults equivalent to f is tolerated by M. Due 

to this we will always consider resets to be released at time 0 when 
dealing with fault tolerance of machines and no generality will be lost. 

Clearly, due to Theorem 2. 3, this same sort of time translation can be 
applied to any other behavioral attribute. 

Example 2. 7 : Let be the sequence generator shown in Fig. 2. 14. 

This machine could be implemented by the circuit shown in Fig. 2. 15. 
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Let f be a fault of which is caused by becoming stuc lr »t-l at 
time r. Then f = (M^, r, 9 ) where is the machine represented by 
the graph in Fig. 2. 16 and 9 is as indicated below. 



Fig. 2. 16. Machine M', 

f.i 

Consider f ^ , i. e. , the fault (M^, -1, 0 ), and note that /3 q (11) - 1 

whereas 0 O (11) = 0. Thus f_ 1 is not 2 -tolerated by M 4 * On the other 
hand both M 4 and M" 1 will produce the sequence 00010101. . . when 
reset at -10. Thus f j is 2 -tolerated by M 4 for resets at -10. By 
applying Theorem 2. 4 we can learn , for example, that L is not 2 -toler- 
ated by for resets at time i+ 1 and that fg is 2 -tolerated by M^. 

Corresponding to our two types of fault tolerance we can define 
two types of errors. 
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Definition 2. 13 ; Let M be a machine, r e R, x e I + , and y e Z + where 
|x| = |y |. The triple (r,x,y) is called a 1 -error ( 2 -error) of M if 
cr 3 (^ r (x)) £ <r 3 (y) ($ r (x) £ y). 

If (r,x, y) is an error of M and f is a fault of M for which 
/3*(x) = y then we say that the fault f causes the error (r,x,y). Note 
that any given error could be caused by many different faults. 

The relation between fault tolerance and errors is very simple. 

A fault f is 1 -tolerated (2 -tolerated) if and only if it causes no 1 -errors 
(2-errors). The relation between 1 -errors and 2-errors is also 
straightforward. Namely, every 1-error is a 2-error, and if o 3 is . 
1-1 then every 2 -error is a 1 -error. Errors are very important in 
any study of fault diagnosis because a fault can never be detected until 
it causes an error. The general goal of on-line diagnoses is protection 
against undesirable behavioral manifestations of faults, i. e , pro- 
tection against errors. 

Since an error can represent erroneous behavior of any dura- 
tion, and since we will wish to detect erroneous behavior when it 
first begins to appear, we introduce the concept of a '’minimal error. ' 
Informally, an error (r,x, y) is a minimal error if only the last 
symbol of the output sequence y is out of tolerance. More formally, 
an error (r, ua, vb) where a € I and be Z is a minimal error if 
(r,u, v) is not an error. If (r,x, y) is a minimal 1 -error then it is 
a 2 -error but not necessarily a minimal 2-error. A minimal error 
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(r,x,y) is said to occur at time |x| - 1. This is the time at which 
the last symbol in y is emitted. 

Often we will be in a situation where we are concerned with a 
machine M tolerating a set of faults which are all caused by the same 
phenomenon but which may occur at any time. More specifically, let 
f be a fault of M. We would like results which assured us that if some 
finite subset of [ f 1 was tolerated by M then all of [f] was tolerated by 

M. Later we will be interested in the same problem with regard to 
diagnosis. 

Our first result of this nature hinges on the fact that any reachable 
state of an SL -reachable machine is reachable by time £. 

Xh£Prem^_5_: Let f be a fault of an £ -reachable machine M and suppose 

f i is tolerated by M for 0 < i < l. Then f. is tolerated by M for all 
i > 0. 


Proof: Assume, to the contrary, that f. is not tolerated by M for some 

i > L Then there exists an error (r,x,y) which is caused 'ov f 
f * J i 

Hence /T (x) = y. Let x = x^ 2 and y = y^ where |x ^ | = |y A | = i. 

By Corollary 2. 1. 1 we know that 





Let q 5(p(r),x i ). Since M is £ -reachable, there exists s e R and 
u e I such that |u | = j <1 and 6(p(s), u) = q. By Theorem 2. 3 



^( q )(x 2> i) = ^ (q) (x 2 ,j). Therefore if \ (u) = v then /3 \ (ux 2 > = 

^s^ u ^0(6(p(s), u)) ^2’ ^ ~ v ^0(q)^ x 2’^ = ^2' Clearly, (s,ux 2 ,vy 2 ) 

is an error and it is caused by f.. Therefore f. is not tolerated. 

J j 

Contradiction. This establishes the result. 

The following general example shows that Theorem 2. 5 is the 
strongest result possible, in the sense that if the hypothesis is at all 
weakened then there exists a fault f and a machine M for which the 
conclusion is invalid. 

Example 2. 8: Consider the & -reachable autonomous machine shown 

in Fig. 2. 17. Let m be an integer between 0 and £ inclusive, and let 



f = 9 ) be a fault of where 
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Consider to be realizing itself. That is, take M = 

The occurrence of f = (M^, r, 0) has an effect on the behavior of 
if and only if could be in state q m at time t. Therefore, L = 

(M^, i, 0) is tolerated by if and only if i 4 m (mod £ + 1). Hence 
f. is tolerated by for i = 0, . . . , m-1, m+1, . . . ,£ does not imply L 
is tolerated by for ail i > 0. Since both m and £ were arbitrarily 
chosen, this general example shows that the hypothesis of Theorem 2. 5 
cannot be weakened. 

Let us now look at faults which occur before time 0. In the 

previous result we have not mentioned this case because if f. and f. 

are equivalent faults and i or j is less than 0 then there is, in general, 

f i f i 

no relation between the behaviors of M and M 3 for resets released at 
time 0. However, in the important special case where f = (M',r, 0) 
is a permanent fault, any f . e [f] with i < 0 will, with respect to resets 
released at time 0, cause identical behavior. 

Lemma 2. 1 : Let f = (M 1 , r, 0) be a permanent fault of M. Then 

f i f i 

/3 r = j3^ for all r e R and i, j < 0. 

Proof: Let i, j <0. Because f is permanent, f. = (M', i, 0) and 

L U 

f. = (M\ j, 0). By Corollary 2. 1. 1, /T = and p ' = % for all r e R. 
J 

This establishes the result. 
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Theorem 2.6 : . Let f be a permanent fault of an £ -reachable machine M. 
If f is tolerated by M for -1 < i < £ then f. is tolerated by M for all 
i € T. 

f. f 1 

Proof : By Lemma 2. 1, p^ = p^ ~ L for all i < 0. Hence, f_ 1 is tolerated 
by M implies that L is tolerated by M for all i < 0. By Theorem 2. 5, 

L is tolerated by M for all i > 0. This establishes the result. 

Before leaving this line of development we will make some final 
observations. Note that a machine M is 0-reachable if and only if 
p(R) = P. In particular, every memoryless machine is 0-reachable. By 
Theorem 2. 5, if M is 0-reachable and f is tolerated by M then f. is 
tolerated by M for all i > 0. 

If f = (M\ r, 0) is a fault of M we think of f as affecting the reset 
mechanism of M if p f (r) f 9(p( r)) for some r e R. If this is not the case 
then a further result, similar to Lemma 2.1 can be obtained. 

Lemma 2. 2 : Let f = (M\t, 9) be a permanent fault of M and suppose 

f. f. 

that p f (r) = 0(p(r)) for all r e R. Then p* = for all r e R and i,j < 0. 

Proof : Since p T (r) = 0 (p (r)), by Corollary 2. 1. 1, p® ~ p* for all r e R. 
The result now follows just as in the proof of Lemma 2. 1. 

Putting the above observations together yields: 

Theorem 2.7 : Let f = (M r , r, 9) be a permanent fault of M. Suppose 

that p'(r) = 0(p(r)) for all r e R and that p(R) = P. If L is tolerated by 
M for any i< 0 then f. is tolerated by M for all i e T. 
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Proof: By Lemma 2. 2 f is tolerated by M for all i < 0. Since p(R) = 

P, M is 0-reachable. Therefore, by Theorem 2. 5 f. is Morated by M 
for all i > 0. This establishes the result. 
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- 2.4 On-line Diagnosis 

Our notion of on-line diagnosis of a system involves an external 
detector (assumed to be fault -free) which observes the input and the 
output of the system and makes a decision as to whether the behavior 
of the system is within "acceptable limits" as set forth by our notions 
of fault tolerance. Initial synchronization, of the system with its 
detector is achieved by using the same reset to initialize both systems. 

The formal relation between a system and its detector is that of 
a "cascade connection. " 


Definition 2. 14 : The cascade connection of two systems S ^ and S 2 for 
which R^ = R 2 and I 2 = x 1^ is the system 


Sj * Sg = (I^> Q, Z 2 , 5 ? A, R p) 

where 

Q = Qj x 

6((q r q 2 ),a,t) = (5^, a, t), 5 2 (q 2 , (A.^, a, t), a), t)) 
M(q 1 ,q 2 )» a,t) = a 2 (q 2 » (^(qj* a, t), a), t) 
p(r,t) = (p 1 (r,t),p 2 (r,t)). 
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Schematically,' * Sg can be pictured as in Fig. 2. 18. 





Fig. 2. 18. The Cascade Connection of and S 


Notation : If u = z^z^. • • z n 6 z+ an d v = a^. . . a^ e I + then the pair 
[u, v] will denote the sequence (z^a^Zj, a ^). . . (z^, a^) e (Z x I) + . 


1 2 

Let * Sg be the cascade connection of with S 2 . Let /3 , p , 
and (3* denote the behavior functions of Sj, S 2 , and Sj * S 2 respectively. 
It can be shown directly from the definition of a cascade connection that 
for all x e I*, € Q, q 2 € Q 2> r e R 1 , and t e T, 


% 


r q 2 ) 


(x,t) 


P n (M)>x],t) 

q 2 q l 



(x) 




(x),xj) . 


and 
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We can now formally define our notion of on-line diagnosis. 

Definition 2. 15 : Let (M, F) be a machine with faults, let D be a machine 
for which M* D is defined, and let k be a nonnegative integer. (M, F) is 
(D, k)-l-diagno sable ( 2-diagnosable) if 
i) 0* = 0 for all r € R, and 

ii) if (r,x,y ) is a minimal 1 -error (2-error) caused by some f e F 
then 

i^([j3*(xw),xw]) f oH for all we I* with |w| = k . 

f 

Thus, the detector D observes the operation of M and must make 
a decision based on this observation as to whether an error has occurred. 
Note that the fault -free realization M and the detector are both time- 
invariant (i. e. , machines), and that the detector takes no part in the 
computation of M T s output. 



Fig. 2. 19. Diagnosis of (M, F) using the Detector D 
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The two conditions of Definition 2. 15 can be paraphrased as: 
i) D responds negatively if no fault occurs; i. e. , d crives no 
false alarms, and 

ii) for all f e F, D responds positively within k time steps of the 
occurrence of the first error caused by f. 

Condition i) implies 0 e Z p, the output alphabet of D. Each 
z e Z D other than 0 is called a fault -detection signal. The choice of the 
symbol ”0'’ to indicate that the machine M is operating properly is 
purely for notational convenience. In general we could let any subset 
of Zp indicate proper operation and let the complement of this set in 
Z D be the set of fault -detection signals. In a practical application this 
choice would depend on the design constraints on the detector. 

As we have done with fault tolerance and with errors, if a theorem 
or remark applies to both "l-diagnosable" and ”2-diagnosable" we will 
just state it once using the general term "diagno sable. " 

Let D be a detector for M. Then Ip = Z x I. There will be times 
when the observation of M r s input by D will be unnecessary or undesired. 
If for all z e Z and a, b e I (z,a) and (z ,b) are equivalent inputs of D 
then we will say that D is independent of M's input . In this case the 
behavior of D does not depend on the second coordinate of D’s input and 
we will take Ip to be simply Z. 

Recall that with this concept of diagnosis that we are only con- 
sidering faults of M. Faults of D must be analyzed separately. In 
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finding a realization M of M and a detector D there is some leeway in 
how much of the added complexity required for diagnosis should go 
into the detector and how much should go into the realization. If it 
all goes into the realization then D will serve only to select out certain 
coordinates of M's output to be used as the output of D. That is, D 
will be memoryless and realize a projection. In this case we will say 
that (M, F) is k-self-diagnosable. In general, it is desirable for the 
detector to be self -diagn os able for some suitable set of faults. 

The basic on-line diagnosis problem can now be restated as 
follows: 

Given a machine M, a class of faults F, a class of 
detectors Cft and a delay k find an (economical) realization M 
of M and a detector De^) such that (M, F) is (D, k)-diagnosable. 

In this chapter we have developed a model for the study of on- 
line diagnosis of resettable machines, and we have restated the basic 
on-line diagnosis problem. In the following chapters results are 
developed which will help to solve this basic problem. 


CHAPTER III 


General Properties of Diagnosis 

In this short chapter we will present a few results on diagnosis 
per se. That is, they are general results which tell us some things 
about diagnosis, independent of the particular fault set being diagnosed 
or of any particular diagnosis technique. In the following chapters 
we look at the diagnosis of specific sets of faults and investigate 
the capabilities and limitations of on-line diagnosis techniques. 

It is interesting to see how our concept of on-line diagnosis 
compares with a similar concept introduced by Carter and Schneider 
[ 7 ] and called "fault -secure" by Anderson [ 1 ]. As stated by 
Anderson, "A circuit is fault - secure if, for every fault in a pre- 
scribed set, the circuit never produces incorrect code space outputs 
for code space inputs. " 

Before making a formal comparison this notion must be trans- 
lated into our framework. In doing so we will strive to be faithful 
to Anderson's intent. 

Definition 3. 1 : A machine with faults, (M, F), is fault -secure if 
(r,x,ya), where a e Z, is a minimal 2 -err or caused by some f e F 
implies a i {^(x) jr e R, x e I + }. 
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Thus if (M, F) is fault -secure then a combinational detector which 
only observes the output of M can detect ail minimal 2 -errors. More 
formally, 

Theorem 3. 1 : (M, F) is 'fault -secure if and only if (M, F) is (D, 0)-2- 
diagnosable where D is memory less and independent of M’s input. 


Proof : (Necessity) Assume that M is fault-secure. Define 
X D : Z -> {0, 1} by 


X D (z) = 


0 if z e {/3 (x) Jr £ R, x £ I + } 

1 otherwise 


Let D be the memoryless detector which realizes A^. Then D is 
independent of M's input and it can easily be verified that (M, F) is 
(D, 0)-2-diagnosable. 

(Sufficiency) Assume that (M, F) is (D, 0)-2-diagnosable where D is 
memoryless and independent of M's input. Let A^: Z — > {0, 1} 
denote the function realized by D and let Z’ = { >3^ (x) J r e R, x e I + }. 
Then A^(z) = 0 for all z e Z f for otherwise a false alarm could occur. 
Let (r,x,ya) where a e Z be a minimal 2 -error. If a € Z' then 

A^(a) = 0 and f is not detected without delay. Therefore a / Z'. 

Hence (M, F) is fault -secure. 


Thus the concept of (D, k)-diagnosable is a generalization of the 
concept of fault -secure. In particular, (D, k) -diagnosis allows for 
(i) different tolerance relations, (ii) nonzero delay in diagnosis, 
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(iii) detectors with memory, and (iv) explicit observation by the 
detector of the input to the system being monitored. 

Our next result shows that "2 -diagnosable" is indeed a stronger 
property than ,: 1 -diagnosable. " This result is a consequence of the 
fact that every 1 -error is a 2 -error but not conversely. 

Theorem 3. 2: If (M, F) is (D, k)- 2 -diagnosable then (M, F) is (D, k)-l- 
diagnosabie, but not conversely. 

Proof: Let (M, F) be (D, k) -2 -diagnosable. Then no false alarms will 

occur and every minimal 2 -error will be detected within k time steps 

of its occurrence. Let (r,x,y) be a mirtimal 1 -error. Then a 0 ($ (x))^ 

o r 

a Q (y) and hence jSjx) £ y. Thus (r,x^,y t ) is a minimal 2 -error for 
some x ^ and y^^ such that x = XjX 0 and y = y^. Since this minimal 
2 -error is detected within k time steps of its occurrence the minimal 
1 -error (r,x,y) must also be detected within k time steps of its 
occurrence. Hence (M, F) is (D, k)-l -diagnosable. 

The counterexample which shows that the converse does not . 
hold is given in the next chapter in the proof of Theorem 4.4. 

Although the converse of Theorem 3. 2 does not hold in general, 
the following partial converse can be obtained. 
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Theorem 3. 3~: If (M* F) is (D, k)-l-diagnosable andcr 3 is 1-1 then 

(M, F) is (D,k)-2-diagnosable. 

Proof : We observed in Section 2. 3 that if a 3 is 1-1 then every 

2 -error is a 1 -error. The result is an immediate consequence of 
this fact. 

The next result will help us to see the relationship between 
fault diagnosis and fault tolerance. 

Theorem 3. 4: Let (M, F) be a machine with faults. If F is tolerated 
by M then (M, F) is (D q> 0)-diagnosable where D q is a trivial memory - 
less machine which realizes the constant 0 function. 

Proof : Condition i) is clearly satisfied, and condition ii) is satis- 

fied because if F is tolerated by M then no f e F will cause any errors. 

The decision in this case can be trivially made since no errors 
are ever produced. The situation for tolerated faults is not so simple 
as this result may seem to indicate for it must be remembered that 
1 -tolerated does not imply 2 -tolerated and thus a 1 -tolerated fault 
could be detected through a 2 -error (see Example 2. 6). 

We will now develop some results concerning diagnosis which are 
analogous to Theorems 2. 5, 2. 7 and 2. 9. Recall that these theorems 
allowed us to infer the tolerance of an infinite set of equivalent faults 
from knowledge that a specific finite subset of them is tolerated. 
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Theorem 3. 5': Let M be a machine and let D be a detector for M. 
Suppose that the cascade connection M * D is i -reachable, and that 
f is a fault of M. If (M, {f .} ) is (D, k)-diagnosable for 0 < i < s. then 
(M, {f.}) is (D, k)-diagnosable for all i > 0. 


Proof: Assume that (M, .{f.}) is (D, k)-diagnosable for 0 < i < £. 

Then condition i) of Definition 2. 15 is immediately satisfied. Let 

(r,x, w) be a minimal error caused by f. where i > £, and let u e I + 

with (u | = k. To show that (M, {f }) is (D, k)-diagnosable for 0 < i 

f 1 — 

we need only show that $,?([$ *(xu),xu]) / ol xu l. 

Let x = XjZ where |xj| = l, and let S*(p*(r), Xj) = (q, q'). Since 

M * D is {-reachable there exists s e R and y e I + with 0 < |y | <{ 

such that 6*<p*(s),y) = (q, q-). Say' |y j = j. Since (M,{i }) is 

(D, k)-diagnosable, £ g D ([ $ (yzu), yzu] ) f oW a nd since the fault 
detection signal must occur after the fault occurs, 

' f. 

V^lq) (zu ’^ zu J) r OW, 

Now by Theorem 2. 3, ^ (q) (zu, i) = ^(zu.j) and hence 
^0(q) ^ Zu? ZU J ) ^ ol zu I. Therefore 



Hence (M, {f.} ) is (D, k)-diagnosable for all i > 0. 
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Example 2. 8, which shows that the hypothesis of Theorem 2. 5 
cannot be weakened, works likewise for Theorem 3.4. This example 
works for both fault tolerance and fault diagnosis because, as was 
pointed out by Theorem 3. 3, tolerated faults are trivially diagnos- 
able. 

Theorem 3.6: Let M be a machine and let D be a detector for M 

such that M * D is l -reachable. If f is a permanent fault of M and 
(M,{L}) is (D, k)-diagnosable for -1 < i < £ then (M, {f.}) is 
(D, k) -diagnosable for all i £ T. 

Proof : Assume that f is a permanent -fault and that (M, {l}) is 

(D, k)-diagnosable for -1 < i < £. By Theorem 3.4, (M,{f.}) is 

fi f_i 

(D, k) -diagnosable for all i > 0. By Lemma 2. 6, ^ = j3 

for all r € R and i < 0. Hence every f. with i < 0 will cause 
exactly the same errors. Since (M,{f_ 1 )) is (D, k) -diagnosable it 
follows that (M, {L}) is (D, k) -diagnosable for all i < 0. This 
establishes the result. 

Let D be a detector for a machine M. It will often be the case 
that the second coordinate of the state of M * D can be uniquely 
determined from the first coordinate. In particular, this is always 
the case when |Q d I = 1. More formally* the cascade connection of 
with Mg is synchronized if there exists a function h: — > Qg 
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such that for. each (q^qg) in the reach able part of * M 2 , 
b(q 1 ) = q 2 * Such a function is called the synchronizing function of 
M 1 * M 2 and it must satisfy h(p 1 (r)) = p 2 (r) for each r € R. 

If M * D is synchronized and M is i -reachable then M * D is 
also t -reachable. We have observed in Chapter II that M is 
0 -reachable if and only if p(R) = P, and that, in particular, every 
memoryless machine is O-reachable. Hence if p(R) = P and M * D 
is synchronized then M * D is O-reachable. In this case we know 
that if f Q is diagnosable then L is diagnosable for 0 < i. 

We terminate this line of development by stating the strongest 
result of this nature. 

Theorem 3. 7 : Let M be a machine for which p(R) = P. Let D be 
a detector for M such that M * D is synchronized. Let f = (M\ t, 9 ) 
be a permanent fault for which p T (r) = 0(p(r)) for all r e R. If 
(M,{f.}) is (D, k) -diagnosable for any i < 0 then (M, {f.}) is (D, k)- 
diagnosable for all i e T. 

Proof: Assume that (M, {f £ }) is (D, k) -diagnosable where t < 0. 

f i f i 

By Lemma 2. 8, ^ = /y for all i, j < 0. Therefore (M,{f.}) is 
(D, k) -diagnosable for all i < 0. Since p(R) = P and M * D is syn- 
chronized, M * D is O-reachable. Thus by Theorem S. 4, (M, {L}) 
is (D, k) -diagnosable for all i > 0. This establishes the result. 



CHAPTER IV 

Diagnosis of Unrestricted Faults 

The investigation of this chapter is concerned with the general 
case in which the set of potential faults is ’’unrestricted. ’’ This set 
of faults is precisely the set of all faults of the machine being 
diagnosed, and hence it is truely unrestricted. 

Aside from representing a "worst -case” fault environment, 
there are certain practical reasons for considering unrestricted 
faults, at least at the outset. In particular, as the scale of integrated 
circuit technology becomes larger, it becomes more difficult to 
postulate a suitably restricted class of faults such as the class of 
all ”stuck-at" faults. Moreover, although other failure models such 
as bridging failures have been proposed and studied (see [15] and [26] 
for example), little is known about the. diagnosis of such failures. 

In addition, intermittent and multiple failures are also possible and 
are even more difficult to model. Finally, for a given failure it may 
be impossible to determine the 6 function of the fault caused by this 
failure. Thus fault sets which do not restrict the fault mapping 6 are 
advantageous. 

Unrestricted faults are typically diagnosed using the technique 
of duplication. One of the aims of this chapter is to take a deeper 
look at duplication and at a generalization of this scheme. An 
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alternative to using duplication for the diagnosis of unrestricted 
faults is investigated in Chapter V. 

The main result in this chapter states that to achieve 1 -diagnosis 
of the unrestricted faults of a machine M, the detector must have as 
many states as M, the behavioral specification for M. Furthermore, 
to achieve 2 -diagnosis, the detector must have as many states as 
M R , the reduction of M. These bounds on the state set size of the 
detector are independent of the delay allowed for the diagnosis. 
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4. 1 Unrestricted Faults 

As stated above, the set of unrestricted faults of a machine 
is simply the set of all faults of that machine. More formally, 

Definition 4. 1 : The set of unrestricted faults of machine M, denoted 
by Uj^j, is the set Uj^ = {f Jf is a fault of M}. That is, 

U M = S> e T e T, and 9: Q Q’} . 

When it is clear what machine is under consideration, the 
identifying subscript will be dropped. 

One important property of the set of unrestricted faults is the 
relation between this fault set and the set of errors that may be 
caused by faults iu this set. Given any r e R, xtl + andy e Z + with 
jxj = jy j, there is a fault f e U such that p£(x) = y. Therefore 
faults in U can cause any possible erroneous behavior, and for 
(M, U) to be (D, k)-diagnosable all of these possible erroneous 
behaviors will have to be detected by D. 

f 

Due to the above observation it^is clear that the output of M 
(the system actually being observed by the detector) can give no 
information about what the correct output should be. Therefore, 
for the diagnosis of unrestricted faults, the ability of D to observe 
M's input directly is crucial. This observation is made explicit 
in the following result. 
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Theorem 4. 1: If (M,U) is (D, k)-l-diagno sable, D is independent of 
M's input, and M is transition distinct then M is autonomous. 

Proof : Suppose that (M, U) is (D, k)-l-diagnosable, D is independent 

of M’s input, and M is transition distinct. Assume, to the contrary, 
that M is not autonomous. Then there exists r e R and x, y e I + 
such that [x | = |y| and o^^(x)) ^ ffg($ r (y)). Let v e I* with |v| =k 
For no false alarms to occur we must have ^(/3 ^(xv)) = 0 ^ xv J and 
$*(0 r {yv)) - 0 Let f € U be a fault for which /3^(xv) = $ r (yv). 

Since (r,x, $*(x)) is a 1-error it must be detected within k time 
steps of its occurrence. But $j?($*(xv)) = /3^($ r (yv)) = C»l yv J. 
Contradiction. Hence M must be autonomous. 
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4. 2 Diagnosis Via Independent Computation and Comparison 

It is a well-known and obvious fact that if a system is dupli- 
cated and both copies are run in parallel with the same inputs then by 
dynamically comparing the outputs of the two copies any error 
which does not appear simultaneously in both copies will be immed- 
iately detected. 

Our view of duplication is shown in Fig. 4. 1. In this figure 


i 1 



D 

Fig. 4. 1. Diagnosis via Duplication in the Detector 


the detector D consists of a copy of M along with a generalized 
Exclusive -OR gate whose output is 0 if and only if its inputs are 
identical. Given such a detector D, it is immediately clear that 
(M,U) is (D, 0)-2-diagnosable. 

Duplication is an expensive technique, involving somewhat 
more than twice the circuitry required for the unchecked system 
alone, but it has a number of positive attributes. In addition to 
being capable of diagnosing the unrestricted set of faults, 
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synthesis is- easy and self -testing and self-diagnosable comparators 
are known to exist [ 1 ] . 

The basic configuration shown in Fig. 4. 1 can be generalized 
to the configuration shown in Fig. 4. 2. In this figure the detector 



Fig. 4. 2. A Generalization of Duplication in the Detector 

consists of a machine M' which runs in parallel with M and a 
combinational comparator C which dynamically compares the out- 
puts of M and M\ Note that for the cascade connection M * D to be 
defined we must have I* = I and R' = R. 

With this scheme M' may be much less complex than M. How- 
ever, we will show that there is a relationship between the size of 
the state set of M* and the level of diagnosis which may be possible 
using M\ 
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In the following result we give a necessary and sufficient 
condition for (M,U). to be (D, 0) -diagno sable where D is structured 
as in Fig. 4. 2. The basic intuition for this result is that (M, U) 
is (D, 0)-l-diagnosable if and only if it is possible to perfectly pre- 
diet the behavior of M from that of M\ 


Theorem 4. 2: Let M realize M under ^ et [M',C] de- 

note a detector for M constructed from M' and C as shown in Fig. 

■-W 

4. 2. There exists cr^ such that IVT realizes M under (cr^, tfg) 
if and only if there exists C such that (M,U) is ([M f , C],0)-1- 
diagnosable. Similarly there exists such that M T realizes M 
under (e,e,cr 3 ) if and only if there exists a C such that (M, U) is 
{[M\ C] , 0) -2 -diagno sable. 


Proof: (Necessity) Assume that M' realizes M under (a^, 

Then o ^ ° cr^ = /3~ for all r e R . Since M realizes M 
2 ^ . — 

under (a^cr 2 ,cr 3 ), a 3 • $ a ^ ° a 1 = ^ for a11 ? e R - Hence 

2 

a* 0 j8* ° c j, = a, 4 ^ /~\ ° ct.. Recall that a. and <j„ are 

3 n 2 ( r ) 1 3 a 2 r 1 12 

assumed to be onto. Because of this assumption, it follows that 

a 3 °^r = a 3 ’ * or aR r 6 R * Le *- C be the comparator shown in Fig. 4. 3. 
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V / 

c 

Fig. 4. 3. The Comparator Used in the Proof of Theorem 4. 2 

Since ° = <j 3 ° ^ the detector [M',C] will give no false 

alarms. Let (r,x,y) be a minimal 1-error caused by f e U. Then 
a 3 (/3 r (x)) / a 3 (/3^(x)). Hence, o^(/3^(x)) f a 3 (/3*(x)), and this will 
cause the Exclusive -OR gate to emit a 1. Therefore the minimal 
1-error (r,x,y) is detected with no delay. Hence (M, U) is 
([ M* , C] , 0) -1 -diagnosable. 

Similarly, if M' realizes M under (e,e,oL) then B = <7 ' o r' 

u r o r 

and a comparator as shown in Fig. 4. 3, but without the cr 3 function, 

can be used to achieve ([M\ C] , 0) -2 -diagnosis of (M,U). 

(Sufficiency) Assume that (M, U) is ([ M 1 , C] , 0)-l -diagnosable. To 

prove that there exists a a 3 such that M 1 realizes M under (p 

we must exhibit a function a’ and show that * 0 = ol ° j3' . This 

0 r o r 

is sufficient because M realizes M under (c^, cr 3 ) 
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and ffj and are assumed to be onto. 

Since no false alarms may occur we know that C(j3 r (x), /T (x)) = 0 
for all r € R and x e I + . Define as follows: <jg(/T (x)) = ^(/^(x)). 
Since has desired property we must simply verify that it is 
indeed a function. 

It is clear that every z e {/T (x) |r e R, x e I + } has an image 
under o^. To see that this image is unique suppose that iT (x) = 

/3g(y)- We must show that a^O^Cx)) = Let j3^(x) = a, 

cr 3 (jS^ (x)) = b, and <Jg(|3 s (y)) = c. Then C(b, a) = C(c,a) = 0. Assume 
to the contrary that b / c. Let f e U be a fault which causes the 
output of M to be c at time [x | - 1 and which has no other affect. 

Let x = uv where v e I. Then (r,x, u)c) is a minimal 1 -error 
and since C(c, a) = 0, it is not detected when it occurs. This contra- 
dicts the assumption that (M, U) is ([M\ C] , 0)-l-diagno sable. Hence 
<7g is a function and M' realizes M under (cr^, (J 2 ,U 3^' 

The proof that (M, U) is ([M\ C] , 0)-2-diagnosable implies that 
there exists a function such that M' realizes M under (e,e,(Xg) 
is essentially the same as the above proof. 

From Theorem 4. 2 we know that if M realizes M T and M* is 
reduced and reachable then j Q | > | Q T |. Hence Theorem 4. 2 tells 
us that if we use the scheme shown in Fig. 4. 2 for the diagnosis of 
unrestricted faults then we must have |Q f J > |Q| in order to achieve 
1-diagnosis, and j Q’j > JQ^ | in order to achieve 2-diagnosis, 
where is the reduction of M. 
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4.3 Diagnosis with Zero Delay 

The question answered next is whether it is possible to 
achieve (D, 0)-l -diagnosis of (M, U) with a detector which is less 
complex, in terms of state set size, than the reduced and reachable 
specification M. One reason to believe that this may be possible 
is the observation that if M has an inverse then this inverse may 
have fewer states than M, and yet a detector constructed using this 
inverse may be capable of diagnosing all of U. Examples of such 
inverses are given in the following chapter. 

Theorem 4. 3: If (M,U) is (D, 0)-l-diagnosable then [Q | > |Q |. 

Proof: Let (M, U) be (D, 0)-l-diagnosable, and assume, to the 

contrary, that J Q D j < |q|. Without loss of generality, assume 
that M is reachable. 

Claim: There exists q, q 1 e Q and s e Q D such that (q, s), (q T , s) 

e P*, the reachable part of M * D, and cr„ ° j3 £ ° £ f . 

o q j q 

Let g: Q -> ^(Q D ) - 0 (where £P{ Q D ) = {x|x C Q^) be » 
defined by g(q) = {s | (q, s) € P*}. Assume that the claim is not 
true. Then ° implies g(q) n g{tf) = 0 .- We know 

from the proof of Theorem A 2 that for each q e Q there is a state 
f(q) for which /3~ = cj^ ° 0 and that f is necessarily 1-1. 

Since M is reduced and reachable there must exist |q| = i unique 
states {q^ . . . , q^} c Q such that i ^ j implies g(q.) n g(q.) = 0 , 

X J. 
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and therefore (Q^j > (Q f. Contradiction. This establishes the 
claim. 

Let q, q* e Q and s e Q D such that (q, s), (q\ s) e P* and 
Og ° / Og » /3^ f . Then there exists a sequence ua where u e I* 

and a e I such that cr - (/3 ( ua)) / <7«(/3 r (ua)) and if u^ Athena^ (u^ = 

w (J o Q 0 4 

a„(j£ ,(u)). Since (q, s) e P*, there exists r e Randy e I* such that 
j q 

6*(p*(r) , y) = (q, s). 

Re call that given any r e R, x e I + andy € Z + with [x f = jyj, there is a 

fault f € U such that 0*(x) = y. Let f e U be a fault for which^(yua) = 

0 (y)/3 ,(ua). Since it is known thata„(/£ (u)) = <7 Q ($ T (u)), it follows 
r q o q o q 

that (r, yua, jS^(yua)) is a minimal 1-error. Now (M,U) is (D, 0)-l- 

diagnosable implies $^([$^.(yua),yua]) £ 0^ yua L Since no false 

alarms may occur, £ D ([ $ (y),y]) = Also, since (q',s) e P*, 

§?([& ,(ua),uaj) =ol ua L Thus 
b q 

j^°([ £*(yua),yua] ) = ([ ^ r (y)^ q ,(ua), yua] ) 

= ([ $ r (y). y] )^ S D ([ ^qi (ua), ua] ) 

= 0 lylol ua l 
= oly^l 

This contradicts the assumption that (M, U) is (D, 0)-l-diagnos- 
able. Therefore jQ^| > Jq|. 
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Corollary 4,-3, 1 : If (M,U) is (D, 0) -2 -diagnosable then Jq^J > |Q r |, 
where is the reduction of M. 

Proof : Assume that (M, U) is (D, 0) -2 -diagnosable, and consider 
M to be realizing M R . By Theorem 3. 2, (M, U) is (D, 0)-l -diagnos- 
able, and hence, by Theorem 4. 3, [Qj-J > 

Let us now consider the set of faults of M which are caused by 
the output of M becoming stuck -at -v, where v e Z, at some time r. 
More formally, the set of permanent output faults o f M is the set 

= {f = (M’, r, e) Jm' = (I, Q, Z, 6, A', R,p) where 
A’(q,a) = A'(s, b) for all q, s e Q and a, b € I } 

Because the set of permanent faults causes the same minimal 
2-errors as the set of unrestricted faults, if (M, F ) is (D, 0)-2-diag- 
nosable then (M, U) is (D, 0)-2-diagnosable. However, U and F q do 
not cause the same minimal 1 -errors, and in fact, (M, F ) is 
(D, 0)-l -diagnosable does not imply that (M, U) is (D, 0)-l -diagnos- 
able. These statements are proved in the following result. 

Theorem 4.4: (M, F q ) is (D, 0) -2 -diagnosable if and only if (M, U) 
is (D, 0)- 2 -diagnosable. However, (M, F ) is (D, 0)-l -diagnosable 
does not imply that (M, U) is (D, 0)-l -diagnosable. 
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Proof: Let(M, F ) be (D, 0)-2-diagnosable. Let (r,ya, w), where a 
a € I, be a minimal 2 -error which is caused by f e U. To show that 
(M, U) is (D, 0)-2-diagnosable it suffices to show that @P([ ^(ya), ya] ) / 
0. Since (r,ya, w) is a minimal error, /3 r (y) = $*(y) and ^(ya) / 
j3^(ya). Say // (ya) = b, and consider the fault f' e F q which is caused 
by the output of M becoming stuck-at-b at time |y f. Then /3^(ya) = 
/3 r (ya), and f' also causes the minimal 2-error (r,ya,w). Since 
(M, F q ) is (D, 0)-2-diagnosable we know that £p([$* (ya),ya]) 0. 

Hence ^([ /3*(ya),ya] ) £ 0 and (M, U) is (D, 0)-2-diagnosable. 

Now assume that (M, U) is (D, 0)-diagnosable. Since F q cU, 
it follows immediately that (M, F ) is (D, 0)-diagnosable. 

We prove that (M, F q ) is (D, 0)-l-diagnosable does not imply 
(M, U) is (D, 0)-l-diagnosable by supplying a counter-example. Let 

i " W AV 

M^, M^, D^, and cr^: Z -■> Z be specified by the tables in Fig. 4.4. 
Then is reduced and reachable, and realizes under 
(e,e,a 3 ). 


? 
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Since |Q d | < |Q jJ we know from Theorem 4. 3 that (MpU) is not 
(Dp 0)-l-diagno sable. To see that (Mp F q ) is (D, 0)-l-diagnosable 
takes a bit of analysis. Briefly, states p, s, and t duplicate states 
a, d and e and any error which occurs when is in one of these 
states is immediately detected. If is in b or c then Dj will be 
in q and if the output becomes stuck -at 2 or 3 at this time it will 
be immediately detected. If is in b or c and a stuck -at -0 or 
stuck-at-1 fault occurs then it will be tolerated for one time step 
and detected the next. This establishes the result. 

In the above counter-example it is clear that (Mp F ) is not 
(Dp 0)-2-diagnosable because a stuck-at-1 fault which occurs when 
is in b causes a 2-error which is not immediately detected. 
Therefore this example also proves that, in general, (M, F) is 
(D, k)-l-diagnosable does not imply that (M, F) is (D, k)-2-diagnos- 
able. Also, if (M, F ) was (D, 0)-2-diagnosable for some D then by 
Theorem 4.4 (M, U) would be (D, 0)-2-diagnosable and from Theorem 
4. 3 it would follow that J Qj-j J > f Q J . Hence this is also an example 
of how 1 -diagnosis may be achieved with a detector which is less 
complex than the least complex detector which is sufficient for 
2 -diagnosis. 
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4. 4 Diagnosis with Nonzero Delay 

Suppose now that we allow some arbitrary, but fix^d, k > 0 
in the detection process. Can this additional time be traded off for 
less detector complexity? Unfortunately, for the unrestricted case, 
the answer is no. In fact, if (M,U)is (D r , k)-l -diagnosable then we 
can construct a detector D, essentially by eliminating unnecessary 
states of D’, such that (M, U) is (D, 0)-l -diagnosable. 

Before stating :this result formally, we will establish an import- 
ant lemma. 

Lemma 4. 1 : If (M, U) is (D T , k)-l -diagnosable then there exists a 
detector D such that j Qj-j J £ |Qjy|< (M, U) is (D, k)-l -diagnosable, 
and for each q e Q^, /^(q, (z,a)) = 0 for some (z,a) e Z x I. 

Proof : Assume that (M, U) is (D f , k)-l -diagnosable and construct 

D from D' as follows: 

1) Delete from the state table of D' any row corresponding to 
a state q for which 

0 ^ {* D t(q, (z,a))|{z,a) € Z x i} . 

2) In the resulting table, replace every reference to the 
deleted state with a reference to an arbitrary remaining state, and set 
the corresponding output to 1. 

3) Repeat steps I) and 2) until no further deletions are possible. 
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Since |Qjy | < oo the above algorithm will terminate in a finite 
number of iterations. 

From the nature of the above construction it is clear that 
Iq d I < [Qp, [ and for each q e Q^, A^q, (z, a)) = 0 for some (z, a) 

€ Z x I. It only remains to be shown that (M, U) is (D, k)-l-diagnosable. 

If the detector D f is in a state q for which 0 / {x^ f (q, (z, a)) | 

(z,a) e Z x i}, then an error must have occurred because if I> f is in q 

then an error detection signal will be emitted regardless of the input 
to D\ Hence this error could be signaled whenever a transition to 
q is indicated, and there would be no loss in diagnosis and no possi- 
bility for a false alarm. Since all minimal errors which q signaled 
would then be signaled before D' got to state q , q could be eliminated. 
This is the essence of what is accomplished in steps 1) and 2). 

This elimination process is necessarily iterative because step 2) 
may introduce new states to be deleted. 

Since this construction is diagnosis preserving, (M, U) is 
(D, k)-l-diagnosable. 

Theorem 4, 5 : If (M,U) is (D*, k)-l-diagnosable then there exists 
a detector D with |Q d | < |Q d ,| such that (M, U) is (D, 0)-l-diagnos- 
able. 

Proof : Assume that (M, U) is (D f , k)-l-diagnosable. From Lemma 

4.1 there exists a detector D such that |Q d | < [Q D , |, (M,U) is 
(D,k)-l-diagnosable, and for each q e Q D , A D (q, (z, a)) = 0 for some 
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(z, a) e Z x Ii 

Claim: (M,U) is (D, 0)-l -diagnosable. 

Assume, to the contrary, that (M, U) is not (D, 0)-l -diagnosable. 
Using induction on the delay of the diagnosis, we will deduce that 
(M, U) is not (D, m ) - 1 -diagnosable for all m > 0. This will establish 
the result for it contradicts the hypothesis that (M,U) is (D, k)-l- 
diagnosable. 

Having assumed that the basis step for our induction is true, 
we assume that (M, U) is not (D, m)-l -diagnosable for some m> 0, and 
we must show that this implies (M, U) is not (D, m+l)-l -diagnosable. 

Since (M, U) is not (D, m)-l -diagnosable, there exists a minimal 
1 -error (r,x,y) caused by f e U and a sequence v e I + with |v| = m 
such that ([ (xv), xvj ) =0 l xv I. Let 6 D <P D (r), [ §* (xv),xv] ) = s. 

Let (z, a) e Z x I such that A^(s, (z, a)) = 0. By Lemma 4. 1 we know 
that such a (z, a) exists. Let f' be a fault for which (xva) = 

/3^<xv)z. Then (r,x, ^ (x)) is a minimal 1-error but 

(xva), xva]) = o f xva I , Hence (M,U) is not (D, m+l)-l -diag- 
nosable. Therefore, (M, U) is not (D, 0)-l-diagnosable implies (M, U) 
is not (D, m)-l -diagnosable for all m > 0. 

But we know that (M,U) is (D, k)-l-diagnosable. Hence (M,U) 
is (D, 0)-l -diagnosable. This establishes the result. 
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Corollary 4.5. 1 : If (M, U) is (D, k)-l -diagnosable then |Q d | > Jq|. 

Proof : This is an immediate consequence of Theorem 4.5 and 

Theorem 4. 3. 

Corollary 4. 5. 2 : If (M,U) is (D, k) -2 -diagnosable then |Q d [ > |Q r |, 
where M R is the reduction of M. 

Proof : Assume that (M, U) is (D, k) -2 -diagnosable, and consider M 
to be realizing M . From Theorem 3. 2, it follows that (M, U) is 
(D,k)-1. -diagnosable. The result now follows immediately from 
Corollary 4. 5. 1. 

Although Corollaries 4. 5. 1 and 4. 5. 2 are results of a negative 
nature, i. e. , they tell what is not possible, in conjunction with what 
we know is possible with duplication they tell us much about the 
diagnosis of unrestricted faults. They say that regardless of the 
specific machine under consideration, the diagnosis scheme used, 
and the delay allowed, any detector which can diagnose the unrestricted 
faults of a given machine must be essentially as complex as that 
machine. In particular, with regard to state set size as our measure 
of complexity, it is impossible to improve upon duplication. This 
provides an answer to Question II, page 11 . These results also 
answer Question III; namely, for unrestricted faults no space -time 
tradeoff is possible, i. e., greater allowable delays in diagnosis 
cannot be traded off for lessened detector complexity. 





88 


We know from Theorem 4. 4 and Corollary 4. 3. 1 that (M F ) 

’ o 

is (D, 0)-2-diagnosable implies |q d ( > |q r |. Can this result be 
generalized as was done for unrestricted faults by the previous 
corollary ? The following example shows that the answer is no. 
This example serves as a good example of when a space -time trade- 
off is possible. 

Example 4. 1 : Consider machines M 2 and D £ of Fig. 4. 5. Since 
M 2 is reduced and reachable, |Q | = |Q |, where M, is the 

, E K 

reduction of 



Fig. 4. 5. Machines and D„ 

ct 2t 
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Note that no output symbol can appear next to itself in any output 
sequence produced by Mg. Since Dg will produce an error detection 
signal precisely when two consecutive inputs to it are identical, it 
can detect all permanent output faults of Mg with a delay of at most one. 
Therefore (Mg, F ) is (Dg, l)-2-diagnosable, yet |Q„ | > |Q |. 



CHAPTER V 


Diagnosis Using Inverse Machines 

It is well known that many circuits can be diagnosed by what is 
commonly called a "loop check. " This involves regenerating the 
input to the circuit from the output and then comparing the regner- 
ated input with the actual input. Often the "inverse” circuit is easier 
to implement than the original circuit, thus providing a savings over 
duplication. For example, division can be checked using multiplica- 
tion. It is also possible to have greater confidence in a loop check 
than in duplication, especially if the checking circuit is less complex 
than the original circuit. 

In this chapter we will investigate the use of "inverse machines" 
for diagnosis using a loop check. Informally, machine M is an 
inverse of machine M if M can reconstruct the input to M from its 
output with at most a finite delay. 

Machines which have inverses can be characterized as being 
those machines which are "information lossless. ” Information loss- 
less machines are machines whose behavior functions satisfy a 
condition which is similar to, but weaker than, the condition which 
a 1-1 function must satisfy. 

Information lossless machines and inverse machines were first 
introduced by Huffman [18]. Huffman devised a test for information 
losslessness and for the existence of inverses. It should be pointed 


90 




91 


out that our definitions of these notions are slightly less general 
than Huffman's. The definitions in this paper are directed towards 
the use of inverse machines for diagnosis. Even [ 13] later devised 
a better means of determining information losslessness, and he 
presented two means for obtaining inverse machines. 

Information lossless machines and inverse machines are also 
discussed in textbooks by Kohavi [ 2 lj and Hennie [ 17]. Kohavi 
provides a fuller description of Even's techniques for obtaining 
inverse machines, and Hennie describes a different means of obtain- 
ing inverse machines. 

The questions about the use of inverse machines for diagnosis 
which we seek to answer in this chapter are: When can an inverse 
be used for the diagnosis of unrestricted faults? Given a machine 
M and an inverse M of M, what will be the delay in diagnosis if M 
is used to diagnose M using a Loop check? How can an arbitrary 
machine be realized so that unrestricted fault diagnosis is possible 
using a loop check? 

We concentrate on unrestricted fault diagnosis in this chapter 
because this is the most natural and important fault class which can 
be diagnosed using a loop check. Inverse machines can be used for 
the diagnosis of more restricted sets of faults but synthesis and 
analysis for more general levels of diagnosis seem to be very 


difficult. 
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5. 1 Inverses of Machines 

Before the inverse of a machine can be formally defined, one 
preliminary notion must be introduced. 


Definition 5. 1 : An (I,n)-delay machine ( delay machine ) is a machine 
M n = (I, I n , I, 6 , X, R,p) such that if a. e I, 1 < i < n + 1, then 


5((a 1? = . . , a n ), a n+1 ) (a 2> * ♦ ■ > a n+ i^ 


and 


A ( (a # 


a ),a n ) = 
n n+1 


An (I, n)-delay machine simply delays its input for n time steps. 
Stated more precisely, if M n is an (I, n) -delay machine then 


, . . , a n )^ a n+l‘ * * a n+m^ a m ’ 

Definition 5. 2 : Let M and M be two machines such that R = R and 
Z = f. M is an (n -delayed) inverse of M if there exists an (I,n)- 
delay machine M n with reset alphabet R such that for all r € R and 
x e I + 

/ 3 r (£ r < X » = * 

Note that if M is an inverse of M then I c Z . However, it is 
not necessary to have I = Z. Symbols which are in Z but not in I 
can be useful for diagnosis. Since they will never appear while M 
is receiving its input from M, the appearance of one immediately 
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signifies that an error has occurred. 

M might more properly have been dubbed a "right inverse” of 
M for if M is an inverse of M it is not necessarily true that M is an 
inverse of M . This is illustrated in Example 5. 1. This example 
is a counter-example to the claims of Kohavi [21] and Even [13] 
that if M* is an inverse of M then M is an inverse of M . 

Example 5. 1: Consider machines and of Fig. 5. 1. is 

a 0-delayed inverse of M 1 but is not an inverse of ^ . 


Fig. 5. 1. Machines and 

In fact, there is no machine which is an inverse of M^. This is 
because the input symbols 0 and 2 are equivalent and so there is no 
way in which they can be distinguished once they have been applied. 

Intuitively, machines which have inverses lose no information 
as they transform sequences from I + into sequences from Z + . This 
intuitive notion is captured in the following definition. 
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Definition 5. 3 : A machine M is information lossless of delay n if 
for all r € R and a^. . . a m , bfo. . . b m € I + (a., b.el, 1 < i < m) 


V a l a 2*-- a m ) = /3 r (b l b 2*** b m ) 


implies a. = b. for 1 < i < m-n. 

r i i - - 


M is said to be lossless if it is information lossless of delay 
n for some nonnegative integer n. M is lossy if it is not lossless. 


Example 5. 2 : Machine of Fig. 5. 1 is information lossless of 
delay 0 and machine of Fig. 5. 1 is lossy. 


R 


I 



Z=I 




Fig. 5. 2. Machine M in Series with an Inverse M of M 

Referring to Fig. 5. 2, if M is lossless and M is an inverse of 
M then intuitively no information is lost as sequences from I + are 
transformed into sequences from Z + by M. The same is true for 
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the entire process which consists of transforming sequences from I + 
into sequences from Z + and then back again. Therefore it is somewhat 
surprising to see, as we have in Example 5. 2, that In may be lossy. 
This may occur because while M must lose no information in trans- 
forming the sequences it observes at the output of M, M may not be 
capable of producing all possible output sequences. Thus while M 
must be lossless with respect to a subset of Z + it may be lossy with 
respect to ail of Z + . 

Even [13] gives an algorithm for determining if a given machine 
is lossless, and if so, of what delay. It is particularly easy to 
determine whether a given machine is lossless of delay 0. This is 
because a machine M is lossless of delay 0 if and only if the output 
symbols in every row which corresponds to a reachable state are all 
distinct. 

Machines for which inverse machines exist can be characterized 
as being precisely those machines which are lossless. More pre- 
cisely, 

Theorem 5. 1: M has a n -delayed inverse if and only if.M is 

information lossless of delay n. 

Proof : (Necessity) Assume that M is a n-delayed inverse of M. 

Let r e R and a^. . . b^. . . b m e I + (a., b^ e I, 1 < i < m) such 

that 0 r (a^. . . a m ) = /^(b^ . . b m ). We must show that ^ ^ for 

all i, 1 < i < m-n. 
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Since M is a n-delayed inverse of M there exists an <I,n)-delay 
machine M n such that F r 0 £ r = In particular, \($ r fa v . .a £ )) 

^(a 1 ...a J ,) = a £ _ n b £ » = tP(b y . .b £ ) = _ n 

for all i, n < i < m. 

Now P T fay • • a m ) = ^ r (b r . . b m ) implies l$$ r fa r . . a £ )) = 
^(P r (b 1 . . . b £ )) for ail £, 1 < £ ^ m * Therefore a £ n = b^ for 
all £, n < i < m. That is, a. = b. for all i, 1 < i < m-n. Hence, 

M is lossless of delay n. 

(Sufficiency) Given a machine M which is lossless of delay n, we 
can show that M has a n-delayed inverse by constructing one. Tech- 
niques for constructing inverses of lossless sequential machines can 
be found in Hennie [17 J and Kohavi [ 2l] . With minor modifications 
to insure the existence of suitable starting states, these techniques 
can be used to construct inverses of lossless resettable machines. 
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5. 2 Diagnosis Using Lossless Inverses 

If M is an n-delayed inverse of M then, by definition, there 
exists an (I, n) -delay machine M n such that ]3 ® Diagnosis 

using inverses can be performed by implementing M, and M n and 
dynamically checking to see if the above relationship holds. The 
basic configuration for diagnosis using inverses is shavn in Fig. 5. 3. 



D 


Fig. 5. 3. On-line Diagnosis Using Inverse Machines 

Since an (I, 0) -delay machine is simply a combinational machine 
which realizes the identity function on I, a detector which uses a 
0-delayed inverse will have the form shown in Fig. 5.4. 

R I 1 



D 

Fig. 5. 4. A Detector which Uses a 0-delayed Inverse 
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We now state the basic result relating the use of lossless 
Inverses with the diagnosis of unrestricted faults. 

Theorem 5. 2: Let M be a lossless machine and let M be an n -delayed 
inverse of M. Let D be constructed from M, the (I, n) -delay machine 
which demonstrates that M is an n-delayed inverse of M, and an 
Exclusive -OR gate as shown in Fig. 5,3. If M is lossless of delay 
d then (M, U) is (D, d)-2-diagnosable. 

Proof: Since ^ r (0 r (x)) = /3^(x), there will be no false alarms. 

Let (r, x, w) be a minimal 2-error caused by a fault f e U. 

Then is£(x) / /3 (x). Let y € I* with jy | = d. Since M is lossless 
a A f a A 

of delay d, y)) £ (3jj3 r (xy)). The Exclusive-OR gate will 

detect this inequality, and hence the minimal 2 -error will be detected 
within d time steps of its occurrence. Therefore (M,U) is (D,d)-2- 
diagnosable. 

This result gives ananswer to Question V, page 12 ; name ly , the 
behavioral property of "having a lossless inverse" is conducive to on-line 
diagnosis since the unrestricted faults of machines with this property 
can be diagnosed using a loop check. 

It is worth noting that the delay in diagnosis is not the delay of 
losslessness of M but rather of its inverse M. Thus an n-delayed 

inverse can be used to achieve diagnosis without delay if it is loss- 
less of delay 0. 
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Example 5. 3 : Consider machines Mg and Mg of Fig. 5. 5. Mg is 

lossless of delay 2 and Mg is a 2-delayed inverseofMg. Since Mg is 
is lossless of delay 0 it can be used to form a detector Dg such that 

(Mg, U) is (Dg, 0)-2-diagnosable. 
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Fig. 5. 5. Machines Mg and Mg 


Example 5. 6, which appears later in this chapter, shows that 

the converse of Theorem 5. 2 does not hold. Namely, it is possible 

to diagnose the unrestricted fault set of a machine using an inverse 

which is not lossless. However, not all inverses can be used for 
the diagnosis of unrestricted faults. Example 5. 5 shows how a lossy 

inverse can be useless for diagnosis. The complete characteriza- 
tion of inverses which can be used for unrestricted fault diagnosis 
is still an open problem. 

Given Theorem 5. 2 and the observation that an inverse machine 
may be lossy, an important question is whether every lossless 
machine has a lossless inverse. This question is presently unan- 
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swered. However, it can be shown that if M is lossless of delay 
0 then there exists a lossless inverse of M. 

The following example shows that it is possible to diagnose the 
unrestricted fault set of a machine using a lossless inverse which 
has fewer states than the reduction of the machine being diagnosed. 

Example 5.4: Consider machines M n and M 0 of Fig. 5.6. M 0 is 

"" O O O 

a 2-delayed inverse of Mg, and is itself lossless of delay 2. 

M 


Fig. 5. 6. Machines Mg and 

Therefore a detector Dg can be constructed from Mg and the 

2 

(I, 2)-delay machine Mg of Fig. 5. 7 such that (Mg,U) will be (Dg, 2)- 

2-diagnosable. Notice that Mg is reduced and reachable and that 

1^3 I > IQ 3 I- However, because Mg is also in the detector |Q d | = 

I Q3 J |Qg | = 16 . Therefore JQ^ j < jQ^ |. This is in keeping with 

3 

what we know from Corollary 4 .. 5. 2. 
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Fig. 5. 7. Machine Mg 


It is interesting to note that results established in this and 
the preceding chapter have something to say about lossless machines, 
per se. The following result gives a lower bound on the state set 
size of any lossless inverse of a lossless machine M. This bound 
is stated in terms of the input alphabet size of M, the delay of loss- 
lessness of M, and the state set size of M . This result, which 
deals only with lossless and inverse machines, is proved using 
Corollary 4. 5. 1 and Theorem 5. 2, which are results dealing with 
the diagnosis of unrestricted faults. 


Theorem 5. 3 : Let M be lossless of delay n, let M R be the reduction 
of M, and let M be a lossless n -delayed inverse of M. Then 


151 > 



Proof: Consider M to be realizing its reduction M^, and consider M and M 
in the configuration used for diagnosis shown in Fig. 5.3. Since Mis 

lossless, by Theorem 5.2 (M,U) is (D, d)-2-diagnosable where d is 
the delay of losslessness of M. Now by Corollary 4.5. 1 j Qj-^ J > 
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|Q R |. Since Q d = Q x I n , |Q D | = |Q| |l| n . Thus |Q | |l| n > |Q r 



If one has a lossless machine M of unknown delay and an inverse 
M of M then a lower bound on the delay n of M can be found using the 
following inequality : 

log|Q R | - log |Q| 
n > . 

log |l| 

This inequality was obtained directly from the one in Theorem 5. 3. 

Given a machine M = (I, Q, Z, 6, A, R, p) let Z' denote the subset 
of Z which may actually appear in an output sequence of M. That is, 
let Z’ = {/3 r (x) (r e R, x e I + }. 

The following result gives a very simple necessary condition 
which all lossless machines must satisfy. 

Theorem 5. 4: If M is lossless then [ I J < |z'J. 

Proof : Assume that M is lossless of order n. Let f I + -> Z + xQ 

be defined by f r (x) = ($ r (x),6(p(r), x)). 

Claim: f is 1-1. 
r 

Let x,y € I + where x ^y. If |x| ^ Jy| then |$ r (x)|^ [$ (y)| and ’ 

hence f (x)^f (y). Thus it suffices to show that f restricted to 
r r r 

inputs of the same length is 1-1. Let |x| = |y | and assume, to the 
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contrary, that f r (x) = f r (y )- Then /3^(x) ando(p(r), x) 6(p(r),y^ This 
implies that |3 (xz) =^.(yz) for all z e I*, and, in particular, for some 
z of length n. Since M is lossless of delay n this implies that x=y. 
Contradiction. Hence if |x [ = |y [ and x / y then f r (x) £ f r (y). This 
establishes the claim. 

Since f f : I + Z + x Q is 1 -1 and |x| - |$ r (x) | it follows that 
|l| m < |z' [ m | Q | for all m > 0. Hence |l[ m /Iz’ | m |q[ < 1 for 
all m > 0. Since J Q [ is a fixed positive integer, this implies that 
|l|/|Z'| < 1, or |l| < |Z'|. 

This result has some immediate corollaries concerning inverses 
of lossless machines. 

Corollary 5, 4. 1 : Let M be a lossless machine with |lj < jz' [. 

Then any inverse M of M with 7! = I is lossy. 

Proof: Let M be an inverse of M with Z' = I. Since M is an inverse 

of M, Z' c r, and we know that J 1 1 < |Z’|. Hence jz T j = Jlj < 

|Z’| < |r|. By Theorem 5.4, M must be lossy. 

This corollary says that if M is lossless and |l| < | z‘ | then 

i 

for an inverse M of M to be lossless M must have output symbols 
which would never appear while M is receiving its input from M. 
However, if a fault occurs to M and causes an error then M could 
emit one of these symbols. The appearance of one of these symbols 
in M's output would immediately cause an error detection signal 
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because this same symbol cannot appear in the output of an (I, n)- 
deiay machine. 

Corollary 5, 4. 2 : Let M be a lossless machine with a lossless 
inverse M. If Z' = I then |l J = j Z' |. 

Proof : This follows immediately from Corollary 5. 4. 1. 

Given the above result, an immediate question is whether M is 
lossless and [ 1 1 = |Z’ | implies that any inverse M of M is lossless. 
As Example 5. 5 shows, the answer is no. 

Example 5. 5 : Consider machine Mg of Fig. 5. 8. Mg is an inverse 
of machine M 3 of Fig. 5. 6 and Ig = Z y but Mg is not lossless. 
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Fig. 5. 8. Machine M* 
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5. 3 Applicability of Inverses for Unrestricted Fault Diagnosis 
The use of inverses as a technique for performing diagnosis 
applies directly only to those machines which have 'citable inverses. 
In the following development it is shown that given an arbitrary 
machine M’, one can always construct a realization M of M’ such 
that M has an inverse which can be used for diagnosis. These loss- 
less realizations are obtained simply by augmenting the output of 
the original machine. Thus it is shown that diagnosis using inverses 
fs a universally applicable technique, and a part of Question I, 
page 11 is answered.. 


Definition 5. 4 : M is an output -augmented realization of M f if M = 
(I',Q\ Z’xA,5\ X, R',p') and X = X' x x A for some X A : Q» x T — > A. 


If M is an output -augmented realization of M' then M realizes 
M f under (e,e,P z ,) where P z , is the projection of Z'xA onto Z\ 
Kohavi and Lavallee [ 20j have given a construction which 
proves the following results. 


Theorem 5. 5: Given any machine M f , there exists an output - 
augmented realization M of M' which is lossless of delay n for 
some n, and in particular, for n = 0. 


Theorem 5, 6 : If M’ is lossless of delay n, then for every m, 

0 < m < n, there exists an output -augmented realization M of M* 
which is lossless of delay m. 
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The method that Kohavi and Lavailee use to achieve the above 
results employs a r 'testing graph" which is used to determine if the 
given machine M f is lossless, and if so of what delay. Output aug- 
mentation which will yield the desired property is determined by a 
method of cutting branches in this graph. Minimal augmentation 
for losslessness of a desired delay is not guaranteed. 

A lower bound on the amount of output -augmentation necessary 
to make a particular machine lossless is given by Theorem 5. 4. 
This result tells us that for the output -augmented realization to be 
lossless, then the size of its output alphabet must be at least as 
great as the size of its input alphabet. 

Any machine can be made lossless of delay 0 simply by aug- 
menting its output with a copy of the input. This gives an upper 
bound on the amount of output augmentation which is necessary to 
make a given machine lossless of delay 0. 

It is tempting to use the Kohavi and Lavailee technique to aug- 
ment an inverse of a machine in the hope of achieving a lossless 
inverse. However, this is impossible because an output -augmented 
realization of an inverse M of M is not necessarily an inverse of M. 

Example 5. 6 : Consider the configuration shown in Fig. 5. 9. Here 
M is any machine, and M is the output -augmented realization of M 
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which was formed simply by augmenting the output of M’ with a 
copy of its input. The inverse M' of M shown in this figure is 



Fig. 5. 9. A Lossless Machine with a Lossy Inverse 

simply the combinational machine which realizes the projection of 
Z x I onto I. This inverse is lossy and is clearly useless for 
diagnosis. 

Now augment the output of M f to form the machine M shown 
in Fig. 5. 10. This machine is lossless but it is not an inverse of 

z I 1 z r 

I I 

I 1 t l m 

1 J 

M 

Fig. 5. 10. An Output -augmented Realization of M' of Fig. 5. 9 
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M and it too ‘is useless for diagnosis. 

Although Kohavi and Lavallee's technique cannot be used to 
construct lossless inverses, it is an important technique because 
it can be used to construct lossless of delay 0 realizations of any 
given machine. The following result shows that given a machine 
which is lossless of delay 0, an inverse of that machine can be 
constructed which can be used for the diagnosis of unrestricted 
faults. 


Theorem 5. 7 : Let M be lossless of delay 0. Then there exists 

an inverse M of M such that (M,U) is'(D, 0)-2-diagnosable where 
D is formed from M and an Exclusive -OR gate as shown in Fig. 5.4. 

Proof : Let M = (Z, P, IU {e} , 6, A, R,p) where e i I and for all 

q € P and a e Z 

_ r 6 (q, b) if b € I and A(q, b) = a 

S(q,a) = 

^ arbitrary if a £ A(q,I) 

__ r b if b e I and A(q, b) = a 

A(q,a) = i 

l e ifa/ A(q, I) 

Thus M is basically the same as M but with the roles of the 


input and output interchanged. 
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The functions 6 and A are well-defined for if M is lossless 
of delay 0 and q e P then A(q, a) = A(q, b) implies a = b. 

If Jl| < |z| then every symbol in Z cannot appear in every 
row of the state table of M. This is what gives rise to the transi- 
tions of M which may be arbitrarily specified. 

Consider M and M to be operating in series as shown in Fig. 

5. 2. Since M and M" have the same reset function, they will initially 
be in the same state. Now if M and M are both in some state q € P 
and the input symbol b € I is applied to M then M will emit A(q,b) 
and go to state 5(q, b). M will emit A(q, A(q, b)) = b and will go to 
state 5"(q, A(q, b)) = 6(q, b). Thus M and M" will make the same 
state transitions and the present output of M will always be the 
present input to M. Hence M is a 0-delayed inverse of M. 

It remains to be shown that (M,U) is (D, 0)-2-diagnosable. This 
must be shown directly because M is not necessarily lossless. 

Since M is a 0-delayed inverse of M there will be no false alarms. 
Let (r,xa, wb) where a € I and be Z be a minimal 2-error. Since 
any input sequence applied to M will cause M and M to experience 
the same state trajectories, 6(p(r),x) = ^(r^w). Say 6(p(r),x) = 
q. Since (r,xa, wb) is a minimal 2-error, /3 r (xa) ^ b. Now 
A(q, /3 r (xa)) = a and therefore A(q,b) / a. This inequality will be 
detected by the Exclusive -OR gate which will emit a fault detection 
signal. Hence (M, U) is (D, 0)-2-diagnosable. 
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It should be noted that the inverse constructed in the proof of 
the above theorem is not necessarily lossless. By using Jzj - jlj 
new symbols, instead of just one, M could have been constructed to 
be lossless of delay 0. 

Example 5. 7 : Consider machine of Fig. 5. 11. This machine 
is an inverse of machine M.^ of Fig. 5. 1. It was constructed as 
described in the proof of Theorem 5. 7. The transitions of which 
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Q 1 \ 

0 
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2 
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R 

a 

b/0 

-/e 

-/e 

d/1 

r 

b 

a/1 

c/0 

-/e 

-/e 


c 

-/e 

b/1 

d/0 

-/e 


d 

-/e 

-/e 

c/1 

a/0 



Fig. 5. 11. Machine 


may be arbitrarily chosen are indicated by a This inverse of 

is not lossless, but it can be used for the diagnosis of unrestricted 
faults of M^. 

A lossless inverse M” of can be obtained from M^ simply 
by changing one of the "e" outputs in each row of the state table of 
to e*. M” so constructed would be lossless of delay 0 because 
the output symbols would be distinct in every row of the state table 
of M" r 




CHAPTER VI 


Diagnosis of Networks of Resettable Syste,.u 

This chapter considers the problem of diagnosing a 
machine which has been structurally decomposed and is represented 
as a network of resettable state machines. The networks con- 
sidered here are very general and they allow for work within 
a wide range of structural detail. 

The fault set is applied to these networks is the 
set of "unrestricted component faults. " Informally, an unrestricted 
component fault is a fault which only affects one component machine 
but which may affect that component in an unrestricted manner. 

This fault set is a natural restriction of the set of unrestricted 
faults. We will show that it is possible to diagnose the set of unres- 
tricted component faults of a network with relatively little redund- 
ancy. 

This chapter focuses on the diagnosis of "state networks. " 

A state network is simply a network in which the external output is 
the state of the network, i. e. , a vector consisting of the state of each 
component machine in the network. Since the state of a state network 
is directly observable at its output, state networks are easier to 
diagnose than arbitrary networks. 


Ill 
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The results in this chapter characterize state networks which are 
diagnosable using combinational detectors. A general construction 
is given which can be used to augment a given state network such 
that the resulting state network is diagnosable in the above sense. 
Upper and lower bounds on the amount of redundancy required by 
such an augmentation are derived. 
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6. 1 Networks of Resettable Systems 

The field of study known as "algebraic structure theory of 
sequential machines” is concerned with the synthesis and decompo- 
sition of sequential machines into networks of smaller component 
machines. Good discussions of this theory can be found in [2], [16] 
and [39], The networks considered in this chapter are very similar 
to the "abstract networks" introduced by Hartmanis and Stearns [16], 
The major differences are in our use of resettable state systems for 
the components and in our system connection rules which force all 
computation to be done in the component systems or in the external 
output function. Hartmanis and Stearns use sequential state machines 
for their components and they allow for a combinational function f. 
from ( x Q.) x I into I. to proceed each component. 

Definition 6. 1: A network of resettable systems is a 6 -tuple 
N = (I, R, (Sj, . . . , S^), (Kj, . . . , K^), Z,A) where 

I is a finite nonempty set, the external input alphabet 
R ,is a finite nonempty set, the external reset alphabet 
Sj =. (I., 5^ R,p.) for each i, 1 < i < n, is a' resettable 

state system, a component system 
for each i, 1 < i <n, is a subset of (Q^, . . . , Q^, i}, 
a system connection rule 
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Z is a finite nonempty set, the external output alphabet 

A.: ( 5 : Q ) x I x T — ^ Z, the external output fun .tion 

i=l 

such that for each i, 1 < i < n, if 
K. = {A., . . . , A.} then I. = x A.. 

1 A X 1 • 4 1 

3=1 J 

Under the intended interpretation, the system connection rule 
K. specifies from which parts of the network component i receives 

its input. By the convention introduced in Section 2. 1, if jq = <f> then 
U is any singleton set. Therefore if M. has no connections then 
it is an autonomous machine. 

Example 6. 1 : The 6 -tuple described in Fig. 6. 1 specifies network 
N^. This network has two component machines M. and M 9 with 
state sets (p 1 ,p 2 ) and {q 1 ,q 2 } respectively. is connected to 
the external input and the output (state) of M 2 and M 2 is connected 
to the external input and the output (state) of Network Nj can 
be viewed pictorally as shown in Fig. 6. 2. 
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N x = (I,R,(M 1 ,M 2 ),'(K 1 ,K 2 )Z,X) 

I = Z = (0,1}, R = {r} 

(Kj^.Kj) - ({Q 2 ,I>,{Q 1 ,I» 


H 

(q r 0 ) 

(q r l) 

(q 2 , 0 ) 

(q 2 > 1) 

R 

p i 

P 1 

P 1 

P 1 

p 2 

1 

P 2 

P 2 

P 1 

P 2 

p 2 

■ 


H 

(p r 0 ) 

( Pl ,D 

(p 2 , 0 ) 

(P 2 .l) 

R 

■ 

q i 

B 

q l 

q l 

r 

■ 

q 2 

H 

q 2 

q l 



(p,q,a) 

• A(p,q,a) 

p l q l 0 

1 

p l q l 1 

0 

P 1 q 2 0 

0 

P 1 q 2 1 

0 

p 2 < 1 ! 0 

0 

p 2 1 

1 

p 2 1 2 0 

0 

P 2 q 2 1 

0 


Fig. 6. 1. Network N 
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Fig. 6. 2. Diagram of Network 

Since any machine may be viewed as a one component network 
a network may convey little or no structural information. 

On the other hand the structural description given by the network 
may be very detailed. For example, each component may be a two- 
state state machine which represents only one flip-flop and one 
coordinate of the global transition function. 

De finition 6. 2 : A network N = (I, R, (S, S n ), (Kj K n ), Z,X) 

defines the system S N = (I,Q, z, 6, A,R,p) where 

n 

Q * x Q, 

t=l 1 
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6(q,a, t) = 6((q^, . . . >q n )» a> 0 
n 

= % 6 2 [ Q j > Prr (Qi » • • • * ^n’ a ^’ ^ 

i=l 1 1 K i 1 n 


n 

p(r,t) = x p.(r,t) 
i=l 1 


A network of resettable machines is a network in which the 
component systems and the external output function are all time- 
invariant. For example, network N x of Fig. 6. 1 is a network of 
machines. The system defined by a network of machines N is also 
time -invariant, and it will be denoted by A network of machines 

N realizes a machine M if M N realizes M. Likewise the defini- 
tions of reduced machines, reachable machines, and so forth can 
be extended to apply to networks of machines. 


Example 6. 2 : Consider network N x of Fig. 6.1. This network 

defines machine of Fig. 6.3 and it realizes of Fig. 6.4 

because realizes Mj. 
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Fig. 6. 3. Machine M.. 

• N i 



Fig. 6.4. Machine M ^ 


A network N = (I, R, (S^ . . . , S n ), (K r . . . , K n ), X, Z) is a state • 

n n 

network if Z = x Q and A(q, a) = q for all q ex Q. and 

i=l i=l 1 

a e I. If N is a state network then S N is a state sys tem. For state 

networks it is unnecessary to explicitly specify the external output 

alphabet and the external output function. 

Since the fault set considered in this chapter does not allow 

for faults which affect the external output function, we will focus on 
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the diagnosis of state networks which realize state machines. The 
diagnosis of the output function will be taken care of separately, 
possibly by duplication. 

Performing diagnosis on state networks is easier, in general, 
than for arbitrary networks because with state networks the output 
function does not mask the internal operation of the network. 

Decomposing a network into a state network and an output function 
and then diagnosing each separately has the effect of applying a 
tighter tolerance relation to the diagnosis of the original network. 

This is also due to the lack of any masking of the state by the out- 


put function. 
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6. 2 Unrestricted Component Faults 

Suppose that N and N' are networks. Then f = (N\ r, 8) is a 
fault of N If f = (S N ,, r, 0) Is a fault of S N . Thus a fault of N can be 
considered to be a transformation of N into another network N' at 
some time r. The notions of fault tolerance, error, and diagnosis 
are extended in a similar manner to apply to networks. 

Given a network N, a natural set of faults to consider are those 
which are caused by failures in one component of N. If f = (N T , r, 0) 
is caused by failures which are restricted to one component of N then 
N' will differ from N only in that one component. Likewise the function 
8 from xQ. into xQ! will act as the identity on each coordinate except 
possibly the one affected by f. These faults are described formally in the 
following definition. 

Definition 6. 3 : Let N = (I,R, (M r . . . , Mj, (Kj, . . . , K fl ), Z, X) be 

a network of machines. A fault f = (N f , r, 0) of N is an unrestricted 
component fault if for some j, 1 < j < n 

i) N r = (I, R, (Mp . . . , Sj, . . . , M^), (K^, . . . , K^), Z, X) where 

Sj € and 

ii) for all (q^, . . . , q^) € x Q^, 0(q^, . . . , q^) = 
implies q. = q! for all i / j. 

The set of all unrestricted component faults of a network will 
be denoted by U^,. 
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Note that since N’ is a network, S! is required to be a state 

system. Because the output alphabets of M. and S f °r? identical 

J J 

and they are both state systems their state sets must also be identi- 
cal. Thus, unrestricted component faults do not permit state blowup 
or collapse. 

The fault set is sufficiently restricted to make possible its 
diagnosis with relatively little redundancy. On the other hand, U c 
is not unduly restricted for it allows for any number and type of 
physical failures to occur to any one component; subject, of course, 
to the general restrictions on faults outlined in Section 2. 3. Thus 
using U_ as the fault class greatly reduces the amount of failure 
analysis which is necessary within the components. 

The relationship between the set of unrestricted component 
faults of a network and the set of errors that these faults can cause 
is not as simple as the corresponding relationship for unrestricted 
faults. It is clear that since an unrestricted component fault can 
affect at most one component directly, if (r,ua,vb) is a minimal 
2-error caused by f e U^, then b will be out of tolerance in only one 
coordinate. However, because the failed component may be connected 
to any other component, minimal 1 -errors do not have this 
property. Nevertheless, a useful property of minimal 
1 -errors is brought out later in the proof of Theorem 6. 1; namely, 
if (r,x, y) is a minimal 1 -error of a "totally redundant" network 
N caused by an unrestricted component fault then under 
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reset r and input sequence x the faulty network will, at some time, 
enter an unreachable state of N. 


A natural extension of U^, the set of unrestricted component 
faults, would be the set of all faults caused by failures in up to m 
components, where m is some positive integer. Since it is very likely 
that any single failure which occurs will be detected before a second 


failure in a different component occurs, the set of unrestricted comp- 


onent faults is the most important special case of the more general 
set of faults. It is also, notationally, the easiest to discuss. For 


these reasons the following development is restricted to this case. 


However, the characterization of combinationally diagnosable networks 


given in the following section generalizes easily to multiple component 
faults. This generalization is discussed at the end of that section. 


The general approach to the construction of combinationally 
diagnosable networks used in Section 6..4 also generalizes to the 
multiple component fault case, although this approach is not felt to 
be a good approach to the more general problem. 
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6. 3 Charact'erization of Combinationatly Diagnosable Networks 
How can state networks for which a combinational detector 
can diagnose the set of unrestricted component faults be character- 
ized? In this section it is shown that this can be done in 
terms of the amount of redundancy in the network. 

Given a state network of machines N it will be assumed that 
N realizes some reachable state machine M under the triple 
^ Cr l’ <7 2 ,<T 3^ ( Since state machines are reduced, Mis 
automatically reduced. ) It will be assumed, as before, that 
CTj and a 2 are onto. The reachable part of N will be denoted 
by P. 

Notation: Given a state network N let Cc{ 1,..., n } denote a subset of 

the set of components. Let C. denote the particular subset {l, . . . , 

. . . , n} . Let q = (q« , . . . , q ) and s = (s 1 , . . . , s ) be states 

in in 

of N. 

Each C induces a partition n on Q = x Q. where q = s( 7 r ) if 

'-i C 

and only if q.^ = s.^ for all i € C. 

A cover of a set L is a set of subsets of L whose union is L. 

Thus every partition of L is also a cover of L. A cover J of L is 
a singleton cover if B e L implies |b| < 1. If J is a cover let 
#Jj | denote the cardinality of the largest element in J. 

The definition of a cover introduced here is more general than 
the usual notion of a cover (or "set system") as introduced by 
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Hartmanis and Stearns [ 16] . They employed set systems to obtain 
series-parallel decompositions. The notion introduced here is not 
used to obtain decompositions but rather to analyze any given decom- 
position. 

Let C c {l,...,n} and let = {b v . . B £ } . C induces 
the cover 

C = {a 3 (B x np),...,cr 3 (B f n P)} of Q 

where if B c P then ^ (B) = { a 3 ( q ) | q € B }. In particular, 

<^ 3 (<fi) = <P. 

Each set of states which the components in C can take on 

corresponds directly to a block of the partition n Thus n 

^ c 

represents the information about the current state of N which is 

given by the current states of components in C. C represents the 

corresponding information as to the state of M which N is currently 

mimicmg. If C is a singleton cover then the current state of each 
component ^C^com pletely determines the corresponding state of M. 

Note that {l, . . . , n} is always a singleton cover. 

Definition 6. 4 : Component M. of a network N is redundant if C 

i i 

is a singleton cover. N is totally redundant if every component of 
N is redundant. 

"Redundant components" are essentially the same as "dependent 
coordinates" as discussed by Zeigler [39], The basic difference 
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is that the concept of a "redundant component" is defined in terms 
of covers rather than partitions (as is the case with "dependent co- 
ordinates " ) , and hence is a more general concept which allows for 
state splitting. 

If N is totally redundant then knowledge of the state of any n-1 
components is sufficient to determine the corresponding state of M 
although it may not be sufficient to determine the state of the remain- 
ing component. 

Example 6. 3 : Consider network of Example 6. 1. Let N'^ be 
the associated state network which is obtained from by changing 
the external output function and alphabet. Let M'^ be the state 
machine corresponding to machine of Fig. 6. 4. Then realizes 
under (e,e,ag) where o^: P’^ -> is given by the following 
table: 


p q 

tf 3 (p,q) 

Pi q x 

a 

h* 

& 

CO 

b 

P2 «1 

d 

P2 «2 

c 
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Now - 7T{2} - ^ ^1^ ^2’ $1* q 2^’ ^2’ ^2^ an ^ so 

Ci = {a 3 {(p r q x ), (p 2 , q x )}, o 3 {(p r q 2 ), (p 2 , q 2 )}} 

= {{a, d},{b, c}} . 

Therefore is not a singleton cover, is not a redundant com- 
ponent, and is not totally redundant. 


Lemma 6. 1: Let N be a totally redundant state network of machines, 
and let q = (q^ . . . , q^ . . . , q^) and q ' = (q^, . . . , q T .„ . . . , q^) be states 
of N . If q, q T € p then a 3 (q) = a 3 (q’). 

Proof: Let q, q'£P. Since q and q’ differ only in their i*h coor _ 
dinate they are in the same block of . Say that tt = -f B * \ 

and that q, q'e Bj . Since q, q' e p, q , q . e B . n p. ‘ Slnce N ls 

totally redundant, C. Is a singleton cover, and thus we must have 
<* 3 (q) = ffg(q'). 


Suppose that an unrestricted component fault f occurs to a totally 
redundant network of machines N and causes a minimal 2-error 

(r,x,y). Say that 0 r <x) = q = (qj,..., <J n ). Due tothe nature off, namely 
that it affects only one component, i3*«=q' = (q 1 ,...,q|,...,q i ). H 

q'ePthen UmmaSl tells usthat this 2-error isnot a 1 -error because 
V q > =cr 3 (qt) - K q / Pthen this 2-errorcouldbe detected by a combinational 
detector whichflags the unreachable states of N. By usingthe above lemma 
and Theorem A.2 the followingcharacterization of combinationally 
diagnosable detectors can be obtained. 
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Theorem 6. i : Let N be a state network of machines which 

realizes a state machine M under Then 'N U^J 

is (D, 0)-l-diagnosable for some combinational detector D if and 

only if N is totally redundant. 


Proof : (Necessity) Suppose that (N, U^) is (D, 0)-l-diagnosable 

where D is combinational, and let D realize the function Xjy Assume, 
to the contrary, that N is not totally redundant. Then for some i, 
is not a singleton cover. Hence there exists q = (q^, . . . ,q^, 
and q = (q^ . . . ,ql, . . . , q n ) such that q, q’ e P and cr 3 (q) £ a^q’). 

Since q, q T € P, A^(q) = A^(q') = 0 for otherwise a false alarm could 

occur. Let f € U^, be a fault caused by the output of ML becoming 
stuck-at-q!^ at a time when M could be in q. This fault can cause 
a 1-error which is not (D, 0)-l~diagnosable. Contradiction. There- 
fore if (N, U p ) is (D, 0)-l-diagnosable where D is combinational then 
N must be totally redundant. 

(Sufficiency) Assume that N is totally redundant. Let D be the 
detector which realizes the function Q — > {0, 1} where 


*D (q) 


0 if q € P 

1 if q / P 


Clearly, D will give no false alarms. 

Let (r,x,y) be a minimal 1 -error caused by f e U^. Let x = uab 




where a, b € I. 
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Then a g (^ r (ua)) = <7,(0* (ua)) and CT 3 (/yuab))Y a,(//{uab)). Say 

^(ua) = q. Then //(uab) = 6 (q,a, t) where t= Ju|. Because f e Up, 

f can affect at most one component of N. Therefore 6(q,a) will 

differ in at most one coordinate from 6 (q,a,t). Let 6<q,a) = s = 

(sj, . . . , s., . . . , s n ) and let 6 f (q,a, t) = s’ = (s., . . . , s!, . . . , s )* 

** 3 n 

Since cr 3 (q) *cr 3 0 r (ua)) and 0 r (ua) = 6(p(r),u), by Theorem A. 2 
a 3 (6(q,a)) = cr 3 (<5(p(r), ua)) = a 3 (/3 r (uab)). Thus 

°g( s ) = cr 3 (i3 r (uab)) 

4 a 3 (4 (uab)) 

= a 3 (s’) 

If qe P then s = S(q, a) e p and applying Lemma 6. 1 we deduce 
that s' / P. Therefore * D (s') = 1 and the 1- error (r, x, y) is 
detected without delay. 

Alternatively, if q/ P then X^(q) = 1 and the 1-error (r, x, y) 
is detected one time step before it occurs. Since in either case the 
error is detected by the time of its occurrence it follows that 
(N, U^,) is (D, 0)-l-diagnosable. 

This characterization of combinationally diagnosable networks 
provides an answer to Questions II and V, page n ; namely, totally 
redundant realizations are diagnosable with a combinational detector 
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and with zero delay, and the structural property of "total redundancy’ 
Is conducive to on-line diagnosabillty. 

Given C C {l, . . . , n} , let = {Bj, . . . , B^}. Then C induces 
a partition ? c on P where = {b 1 np n p} - <£. 

If a partition of a set L is a singleton cover then we will denote 
this by writing n = 0. This notation is derived from the observation 
that this partition is the least element of the lattice of all partitions 
of L. 

Corollary 6. 1. 1 : Let N be a state network of machines. Then 

(N,U C ) is (D, 0)-2-diagnosable for sortie combinational detector D 
if and only if 77 =0 for all i, 1 < i < n. 

l “ ~ 

Proof : Consider N to be realizing the reduction of M^. Then 

a 3 is 1-1- By Theorems 3. 2 and 3. 3 (N, U c ) is (D, 0)-2-diagnos- 

able for some combinational D if and only if (N, U r ) is (D, 0)-l- 

diagnosable for some combinational D. 

Now since is 1-1, C [ is a singleton 

cover if and only if = 0. Hence N is totally redundant if and 
_ i 

only if ir c =0 for all i, 1 < i < n. 

1 

The result now follows immediately from Theorem 6.1. 

Example 6.4: Consider state network of Example 6. 3. Since 
NJ is not totally redundant, from Theorem 6. 1 we know 
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that (Nj,U^,) is not (D, 0)-l-diagnosable for any combinational 
detector D. 

Now construct a new network N'^' from by adding a new 
component as shown in Fig. 6. 5. 

= (I,R, (M r M 2 ,M 3 ), (K r K 2 ,K 3 )) 

I, R, M^, M 2 , and Kg are identical to those 
of network of Fig. 6. 2. 

K 3 = {1} 



Fig. 6. 5. Network N^ 


Network N^’ realizes machine of this example under 
(e,e,aj) where P^’ -> is given by the following table: 
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p 

q 

S 

q> s) 

Pi 

q i 

s l 

a 

Pi 

q i 

S 2 

d 

P 2 

q 2 

3 1 

c 

P 2 

►Q 

to 

s 2 

b 


For network N^' 

^ q ” n {2, 3} ~ (PjtQ^»S2)» 

(Pj> ^2’ s l^ ^2’ q 2* s l } ’ ^Pj* ^2’ s 2^’ ^2* q 2’ ^2^ 

and C 1 = l {a},{d}, {c},{b} } . Thus is a singleton cover and 
component is redundant. Similarly one can show that M 2 and 
M 3 are redundant. Hence N” is totally redundant, and (N’ 1 ’,U C ) is 
(D, 0)-l-diagnosable for some combination of detector D. 

It is enlightening to consider Corollary 6. 1. 1 from the point of 
view of error detecting codes. Let N be a state network realiza- 
tion of a reachable state machine M. Then each of the reachable 
states of N can be viewed as a code word of an encoding of Q . 

Two such code words are said to be adjacent if they differ in only 
one coordinate. Clearly, an encoding can be used to detect all 
errors in single coordinates if and only if no two code words are 
adjacent. In addition, it is clear that two code words are adjacent 
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if and only if H r 4 0 for some i . Thus Corollary 6. 1. 1 tells us 
u i 

that single error detecting state assignments are necessary and suf- 
ficient to insure combinational and delay less 2- diagnosis of unre- 
stricted component faults. 

The generalization of the characterization given by Theorem 6. 1 
and Corollary 6. 1. 1 to faults caused by multiple failures is straight- 
forward. For example, if failures in two components are being con- 
sidered then a totally redundant network would be one in which the 
corresponding state of M could be deduced from the states of any 
n-2 components of N . With this altered definition the statement of 
Theorem 6. 1 could then remain unchanged. By considering the two 
failed components as one larger component the proof could also 
remain virtually unchanged. 
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6.4 Construction of Combinationally Diagnosable Networks 

The basic problem approached in this section is a constrained 
decomposition problem; namely, given a state machine M , find a 
totally redundant network realization N of M . From Theorem 
6. 1 we know that such a network would be combinationally diagnos- 
able, and thus a solution to this problem would be an answer to 
Question I, page 11. 

The approach to this problem taken here is to find a network 
realization by conventional decomposition techniques and then make 
this network totally redundant through the addition of one component. 

Example 6. 4 showed that a totally redundant network could 
be constructed from network through the addition of one compon- 
ent machine. In this section it is shown that this can be done for 
any network. In addition, upper and lower bounds are derived on the 
minimum number of states that such an additional component must 
have. 

Theorem 6. 2 : Let N be a state network of machines. Let m. = JQ. J, 

and let m = max m. . A network N’ where N' realizes N and 
lS£n 1 

<N’,U C ) is (D, 0) -2 -diagnosable for some combinational detector D 
can be constructed from N by the addition of an m state component. 

Proof: Without toss of generality take Q. = {0, . . . , m.-l} . Let 

N - (I, R, (M^, . , . , M^), (K^, . . . , K^)) and let N* = (I, R, , 
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M n+ l)» (K 1» * * * ’ K n ’ K n +1^ where K n+1 = {Q r • • • » Q n> i} and where 

M n+1 is constructed such that for all q = (q^, . . . , ) e p’ , the 

n+l 

reachable part of N\ 2 q. s 0 (mod m). A machine M , with 

i=l 1 n+l 

m states which satisfies the above property is described below: 


M n+1 (I n+l’ Q n+r 6 n+l’ R,P n+l^ 


where 


n 


I * = x Q. x I 
n+l i=x 


Q n+1 = "t°> • • • > m~l} 


P n+ l^ r ) = * S P t (r) (mod m) for all r e R 


i=l 


6 n+l (q n+l’ (q l’ 


V a)) s 


■t 

i=l 


q| (mod m) for all 


€ ^ ^ 1 ^ n + 1, and all a £ I where 

(qj,...,q^) = 6(( qi ,...,q n ), a ). 


It is clear that N’ realizes N. Therefore, it remains only to 

be shown that (N\ U c >: is (D, 0)-2-diagnosable for some combinational 

D. 

Let D be the combinational machine which realizes the function 
. n+l 

D : i?i Q i {o, 1} where 
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* D (q r -.. >%+!> ~ 


i=l 1 
1 otherwise 


0 (mod m) 


n+1 


II T J. 

Since (q^, . . . , q n+ ^) e P T implies = 0 (mod m) no false alarms 

will occur. 

Let (r,x,y) be a minimal 2 -error caused by f e U c> Since 
(r,x,y) is a minimal error and f only affects one component of N', 

£ r (x) and /^(x) will differ in exactly one coordinate. Say ^(x) = (q 

• ••> anc ^ = ^1’ ’ * * ’ •••> q n+ i^* Now (q^, . . . , q n+ ^) 

n+1 . 

€ P implies 2 q s 0 (mod m). Since q. ± q! and Q. < m, 

i~ 1 * li l 4 — 

q t f q! (mod m). Therefore q^ + . . . + q’ + . . . +q n+1 ¥ 0 (mod m). 
Hence* the error (r,x,y) is detected without delay, and (N’,U C ) is 
(D, 0)-2 -diagnosable. 


In the proof of Theorem 6. 2 a construction is given which 
can be used to form a totally redundant network from any network 
of machines. This construction simply involves the addition of one 
component to N. This theorem also gives an upper bound on the 
amount of additional redundancy required to make a given network 
totally redundant. This upper bound is stated in terms of the size of 
the state set of the additional component. 

The detector used in the proof of Theorem 6.2 simply checked 
to see if the states of the components always summed to 0 (mod m). 
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By using a different and possibly more complex detector, namely one 
which can determine if the present state is in the reachable part, the 
number of states which the additional component must have can be reduced. 

Let m! be the number of states that M., 1 < i < n, can actually 
enter while M is a component of network N, and let m* = max m f . 

i<i<» * 

That is, let m f = max |p.(P)j, where P.(P) is the projection onto 
l<Kn 1 1 

coordinate i of the reachable part of N. Then m r < m because P^P) 

C Q i? 1 < i < n, and Theorem 6,2 holds with m replaced by m\ 

This claim is established in the following theorem. 

♦ 

Theorem 6.3 : Let N be a state network of machines. Let 

m! = jp.(P)[, and let m’ = max m. f . A network N T can be con- 

l<i<n 1 

structed from N by the addition of an m T state component such that 
N T realizes N and (N T , U^,) is (D, 0)-2-diagnosable. 

Proof: Without loss of generality take P^(P) = {0, . . . , m!-l} and 
Q t = > m j- l} • Construct N’ by adding component where 

N' and are exactly as in the proof of Theorem 6.2 except for 

m being replaced by m\ 

We will show that (N',U^) is (D, 0)-2-diagnosable by showing 
that Kg = 0 for all i, 1 < i < n, and then appealing to Corollary 

l 

6 . 1 . 1 . 
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Assume, to the contrary, that / 0 for some i, say for i = 1. 

l 

Let Kg = {B 1 , . . . , B^}. Then for some j, 1 < j < 4, {B. fi P ( > l. 

This implies the existence of two states q = (qpq 2 , * • * an( * 

q’ = (qp q 2 > • • • > Q n ) such that q, q f € P’ and qj j* q^. Now q, q’ € P* 

implies q^ + q 2 +. . . + q n - 0 (mod m’) and q^ + q 2 + . . . + q n = 0 

(mod m'). Hence, q.^ - q^ (mod m') and since 0 < q^, q^ < m’, 

^1 = q l‘ Contradiction. Therefore = 0 for all i, 1 < i <n, 

'“'i 

and the result follows immediately from Corollary 6. 1. 1. 

A technique similar to the one used in the proof of Theorem 6.2 
could be used for the diagnosis of n Mealy machines which operate 
in parallel with the same inputs and resets. In this case one 

additional Mealy machine would be required which had as many out- 
put symbols as the machine with the largest output alphabet. There 
is no guarantee, however, that this technique will result in a savings 
over duplication because the additional machine may need as many states 
as the product of the number of states of the original n machines. 

We have shown that given a network N, a totally redundant 
network N f can be constructed thru the addition of a component with 
no more than m’ states where m' = max |P^(P)|. This amount of 
additional redundancy is not always necessary for N may already 
be totally redundant. The following example shows that this amount 
of additional redundancy is not necessary even if no component of 
the network is redundant. 
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Example 6. 5: Consider state network N 2 of Fig. 6. 6. 

N 2 = (I,R,(M r M 2 ),(K r K 2 )) 
r = {0,1, 2, 3,4}, R = {r } 

(Kj,K 2 ) = ({!},{!}) 



Fig. 6. 6. Network N 2 

Ng realizes state machine M 2 of Fig. 6. 7 under (e, e, o^) where 
a 3 : P 2 ^2 is Siven by the table in Fig. 6. 8. 
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Since iQg | =8 and |Q 1 x Q 2 1 = 16 it should be clear that while Ng 
is not totally redundant there is some redundancy in this network 
realization of Mg. Thus if we were to add a component Mg to Ng 
in an attempt to form a totally redundant network Ng we should not 
be too surprised if we succeeded with a component Mg with fewer 
than m f states, where for network Ng m' = 4. In fact, if the 2 -state 
machine Mg = {Qj x Qg x I, {s^, Sg}, 6g) were added to Ng where 
6g is such that Mg is in s^ whenever M^ and Mg are in (p^qj), 
^ p 3 ,q 3^ or ^4 » ^4 ^ and in s 2 whenever and Mg are in 
(p^^’ or then the network Ng so formed 

would be totally redundant. 

An intuitively satisfying means to verify this claim is as follows. 
Component computes the information 3-bout the correspond- 
ing state of M. In this case the are the following partitions of 


«*• 

C {l} = ^ a > d » b, e; eTh; f7g~} 

C{ 2 } = { aTb; cTd; eTi; gTh } 
gj_ — ■[ a, c, e, g, b, d, f, h } 

Since • C|g| = • Cr^ = 0 any two 

components taken together provide total information as to the corres- 
ponding state of Qg. Hence the remaining one will always be 


redundant. 
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The following result gives a lower bound on the number of states 
that an additional component must have in order for the resulting 
augmented network to be totally redundant. If the network under 
consideration is already totally redundant then the lower bound given 
by this result is one. Since the behavior of a state machine with one 
state is always a constant function, the actual addition of such a com- 
ponent is unnecessary. 

Theorem 6. 4: Let N be an n component state network and let N' 

be the state network formed from N by the addition of a component 

with i states. If N f is totally redundant then SL > max #|c I. 

" l<i<n l ' 

Proof: Without loss of generality take #|C- | = max #|c. J. and 

l<i<n 1 ‘ 

letd=#|Cj|. Then for some Be tt c and q = (q^..., q^) e B, (c^^np) j 
= d. That is, if it is known that M 2 is in q 2 , that M 3 is in q 3 , and 
so forth up to M n being in q n then there is still a d state uncertainty 
as to which state of M the state of M currently corresponds. It is 
necessary for M n+1 to have at least d states to resolve this 
uncertainty. 

The above result provides a good lower bound on the amount 
of additional redundancy required to form a totally redundant network, 
and it does so by taking into hccount the redundancy which already 
exists in the network. This level of redundancy, however, is not 
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always sufficient because it may be impossible to find a component 
with d states which will simultaneously resolve the uncertainties 
represented by C^, C 2 , . . . , and C n> The following describes just 
such a situation. 


Example 6 . 6 : Consider the state network of Fig. 6.9. 

n 3 = (i,r,(m 1 ,m 2 ,m 3 ),(k 1> k 2 ,k 3 )) 

I = {0,1,2}, R = {r} 

<k 1 ,k 2 ,k 3 ) = ({i},{x},{q 1 ,Q 2 ,i}) 
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Fig. 6 . 9 . Network 
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This network realizes machine Mg of Fig. 6. 10. 
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Fig. 6.10. Machine Mg 


For Ng realizing Mg we have 

Cj = { {a, c},{b,d},{e,g},{f,h} } 

C 2 = { {a,e},{b,h},{c},{d},{f},{g} } 

C 3 = { {a},{b},{c,d},{e,f},{g, h} } 

Therefore m = max JQ. | = 3 and d = max #|C. | = 2. 

1< i<3 1 l<i<3 1 

Suppose that it is desired to add a component M^ to Ng in order 
to form a totally redundant network. Theorem 6,4 tells us that M^ 
must have at least 2 states, and Theorem 6.2 tells us that there is 
a 3 -state component which will work. We will show that in this case 
it is not sufficient for M^ to have 2 states. 
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Let be a 2-state component which when added to forms 
Ng. Let = {Bj, B 2 }. Since C| 4 j is a cover of Q^> B i u B 2 = 
Qg. If |b^ | > 5 or Jb 2 | > 5 then would not be a singleton cover 
because M 2 and M 3 have only 2 states each and together they could 
not resolve a 5-state uncertainty. Therefore if N 2 is to be 
totally redundant we must have | J , |b 2 | <4 and thus Cj 4 j will 
be a partition of Q^. 

For Ng to be totally redundant M 4 must resolve the following 
pairs of states: {a,e}, {b,d}, {e, g},{f, h}, {b,'h}, {c, d},{e, f}, and 

{g, h}. It can resolve a pair only if the pair is split between B^ 

' ' ' .. “ 

and Bg. But it is easy to verify that these eight pairs cannot all 
be simultaneously split by any two -block partition. Therefore 
there is no 2 -state component which when added to will form 
a totally redundant network. 



CHAPTER VII 


Conclusions and Open Problems 

In this report a fresh look at on-line diagnosis was taken from 
a system theoretic point of view. The approach used in this inves- 
tigation was system theoretic in the sense that resettable discrete- 
time systems were used as a basis for a well-developed formal 
model of on-line diagnosis, and formal methods were used to inves- 
tigate this model. As evidenced by the results in Chapters III 
through VI this approach has proved to be very fruitful. One advan- 
tage of this approach is that the results developed in this report 
are independent of any particular technology and may be applied to 
any system which can be modeled as a resettable machine. 

In Chapter I, a number of fundamental questions concerning on- 
line diagnosis were stated, and in Chapter II a complete model for 
the study of on-line diagnosis was developed. Subsequent chapters 
provided some answers to these questions for the unrestricted fault 
case and the unrestricted component fault case. At this point it is 
appropriate to review these questions to see just what has been ac- 
complished and what remains to be done. These five questions are 
paraphrased below, and each question is followed immediately by 
a discussion of it. 
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I. What good on-line diagnosis techniques are available and when 
is each applicable? 

For unrestricted faults the techniques investigated have been 
duplication and loop checking. Duplication is very easy to implement, 
and it was shown in Corollaries 4. 5. 1 and 4. 5. 2 that, in terms of 
the state set size of the detector, it is impossible to do any better 
than duplication. Thus duplication is a very good technique. How- 
ever, with other measures of complexity it may be possible to beat 
duplication. In addition, duplication suffers from the observation 
that both copies could have the same failures or built in weaknesses 
from birth. For these reasons the use of inverses for unrestricted 
fault diagnosis was also studied, and this technique was shown to be 
applicable regardless of the specified behavior. 

For unrestricted component faults the basic technique studied 
was the construction of totally redundant networks from arbitrary 
networks through the addition of one component. This technique 
was also shown to apply to any specified machine. 

Certainly, other techniques for the diagnosis of these sets of 
faults exist and their investigation is an open problem. One fruitful 
direction might be to pursue a more general approach to the con- 
strained decomposition problem discussed in Section 6. 4. 



147 


IT. When is a given realization diagnosable ? 

Answers to this question depend, of course, on what constraints 
on allowable detectors and delays are given by a particular meaning 
of the word "diagnosable. " If no restrictions are placed on the set of 
possible detectors then every realization is diagnosable for any set 
of faults since the realization could be duplicated in the detector. 

For unrestricted faults, if detectors are only allowed to perform 
a loop check then Theorem 5. 2 tells us that realizations with loss- 
less inverses are diagnosable. However, the characterization of all 
realizations which are diagnosable in this sense is still an open 
problem. 

For unrestricted component faults, we know from Theorem 6. 1 
that a realization is diagnosable if and only if it is totally redundant. 

III. What time -space tradeoffs are possible between the added com- 
plexity needed for diagnosis and the maximum allowable delay? 

By Corollaries 4, 5. 1 and 4. 5. 2 we know that no time -space 
tradeoff is possible for unrestricted faults. However, Example 
4. 1 shows that a tradeoff is possible for permanent output faults. 

For unrestricted component faults the question remains unanswered. 

While no generally useful time -space tradeoffs have been found, 
specific tradeoffs are possible for suitably restricted sets of faults 
and certain specific behaviors. In addition to Example 4. 1, this is 
evidenced by Example 7. 1 which appears later in this chapter. 
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IV. What is the relationship between a given fault set and the set of 
errors which can be caused by faults in that set? 

This relationship was discussed in Section 4. 1 for unrestricted 
faults, Section 4. 3 for permanent output faults, and Section 6. 2 for 
unrestricted component faults. Briefly, unrestricted faults can 
cause any possible erroneous behavior; permanent output faults 
cause the same minimal 2 -errors as unrestricted faults but not the 
same minimal 1 -errors because the output becomes constant once 
a permanent output fault occurs; unrestricted component faults cause 
minimal 2 -errors which are out of tolerance in only one coordinate, 
and if the network under consideration is totally redundant then 
minimal 1 -errors caused by unrestricted component faults always 
result in an unreachable state of N being entered. As expected, 
this relationship is very important and was used in results concern- 
ing each of these sets of faults. 

V. What properties of system structure and behavior are conducive 
to on-line diagnosis? 

For unrestricted component faults the structural property of 
total redundancy was seen to be quite important. The behavioral 
property of "having a lossless inverse" was also seen to be useful 
since the unrestricted faults of such systems could be diagnosed via 
a loop check. 
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A potentially fruitful area for further work would be to look at 
special subclasses of machines (e. g. , definite machines, linear 
machines, etc. ) to see what diagnosis qualities they possess which 
are not possessed by machines in general. 

Since this study focused on the diagnosis of unrestricted faults 
and unrestricted component faults, one large open area for further 
research is to answer these questions for other important sets of 
faults. A possible direction for such research is outlined below. 

In this report, abstract (i. e. , totally unstructured) systems 
have been considered with the exception of some of the examples and 
the networks considered in Chapter VI. Such an approach is good 
for developing formally the concepts involved in our theory and for 
studying the diagnosis of unrestricted faults, but some of the questions 
raised can best be studied in a more structured environment. One 
reason for this is that with a structured system we can consider the 
causes of faults. For example, given an abstract system it makes 
no sense to speak of the set of faults caused by component failures 
of a certain type or by bridging failures. However, given a structured 
representation of a system (e. g. , a circuit diagram) we can discuss 
these and other types of failures and determine the corresponding 
faults. 

There are many different structural levels that could prove 
useful to a further investigation into the theory of on-line diagnosis. 
Two levels which we believe will be important are: the binary 
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state -assigned level and the logical circuit level. These levels 
and the basis for their potential usefulness are explained below. 

A machine M is said to be binary state -assigned if Q = {0, l} n 
for some positive integer n. Given such a machine, various types 
of memory failures such as stuck-at-0, stuck-at-1, and more 
general types can be considered. The faults corresponding to these 
failures can be enumerated and comparisons can be made between 
various schemes for diagnosing these faults. Memory faults have 
been studied before in the context of fault tolerance and off-line 
diagnosis by Meyer [28] and Yeh [38] respectively, and they are 
an important class of faults for a number of reasons. For example, 
only a limited amount of structure is needed to discuss them. Thus 
memory faults can be analyzed before the circuit design of the 
machine is complete. Also, it is memory which distinguishes truly 
sequential systems from purely combinational (one -state) systems. 
Combinational systems are inherently easier than sequential systems 
to analyze and a number of techniques for the on-line diagnosis of 
such systems are known (see [ 19] and [34] for example). 

Time -space tradeoffs are also possible in the diagnosis of memory 
faults. Let denote the set of single memory stuck-at faults, that 
is, the set of faults caused by a stuck-at failure in one memory 
element. It can easily be verified that if (M, F ) is (D, 0)-2-diag- 
nosable where D is combinational then the reachable states of M 
must be encoded into a single error detecting code. However, as 
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the following example shows, this is not necessarily true if nonzero 
delay is allowed. 

Example 7. 1: 

Consider the binary state-assigned state machine M whose 
state graph is shown in Fig. 7. 1. Since M is an autonomous state 
machine the labels on the transitions convey no information and 
hence are not shown. 



Fig. 7. 1. State Graph of M 


Claim. (M, F m ) is (D, 2)-2-diagnosable for some combinational D. 

Let D be the combinational detector which realiz'es the function 
specified by the following table: 
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z 

V z) 

000 

1 

001 

0 

010 

0 

Oil 

0 

100 

0 

101 

0 

110 

0 

111 

1 


Thus a fault is indicated if and only if the detector observes 
that the system it is monitoring has entered one of the two unreach- 
able states 000 and 111. 

It is instructive to view the action of M in 3- dimens ions as 
shown in Fig. 7.2. In this figure the action of the unreachable (or 
"error indicating") states have been omitted for clarity. 



Fig. 7. 2. 3-Dimensional View of M 
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Note that any single memory stuck-at-0 fault will cause the 

resulting faulty system to enter state 000 within 2 time steps of its 

occurrence. Similarly, the state 111 will be entered within 2 time 

steps of the occurrence of a stuck-at-1 fault. Hence (M, F ) is 

m 

(D, 2)-2-diagnosable. This homing action after a fault occurs is il- 
lustrated below in Fig. 7. 3. This figure shows the state graph of 
M' where f = (M T , t, q) is a fault of M caused by the memory 
element corresponding to the second coordinate of the state-assign- 
ment becoming stuck-at-0. 



Fig. 7. 3. State Graph of M’ 

The essence of the technique used in this example is to find a 
state-assigned realization with the property that any single memory 
stuck- at- fault will cause the resulting faulty machine to enter into 
a normally unreachable state. This is a generalization of the basic 
mechanism for diagnosis used by any scheme which involves encoding 
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the reachable part of the state set into a single error detecting code. 

Having looked at the binary state -assigned level of structural 
detail, let us now turn briefly to the logic circuit level. A system 
possesses structure at the logical circuit level if a representation 
of the system is given in terms of a logical circuit composed of 
primitive logical elements. These may be of the AND-OR variety, 
threshold elements, or any similar elements of a "building block" 
nature depending upon the technology being considered. This level 
is useful for investigating failures in the primitive components. 

The circuit in Fig. 2. 2 is an example of a structural representation 
at this level and the failure of this circuit discussed in Example 2. 2 
is a simple example of the analysis that can be conducted at this 
level. 

Further work could also be performed at the network level of 
structural detail which was introduced in Chapter VI. At this level 
one could study the problem of implementing on-line diagnosis on a 
whole computer whereas with the other levels the emphasis would 
be on diagnosing one module. Note that in our definition of diagnosis 
the detector is not constrained to give simply a yes -no response. 

It could also provide extra information for use in automatic fault 
location. Thus, at this level, the problem of which subsystems must 
be explicitly observed by the detector to achieve some desired fault 
location property could be studied. 
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One problem that requires extension of our present model (at 
any structural level) is the problem of automatic reconfiguration 
of the system under the control of the detector. To study this 
problem, the model used would have to allow for feedback from 
the detector to the system it is observing. The question of how such 
an extension should be made is an interesting one and, if answered 
satisfactorily, could serve as a basis for a systematic investigation 
of reconfiguration techniques. 



APPENDIX 


Resettable Machine Theory 

The goal of this appendix is not to study the theory of resettable 
machines per se but rather to cover that part of it which is used in 
this study of on-line diagnosis. The theory of resettable machines 
follows closely the theory of sequential machines. The main 
differences in the definitions stem from the presupposition that a 
resettable machine is reset before every use. One consequence of 
this is that the "unreachable" states of a resettable machine are 
always ignored. 

We begin by repeating here the basic machine notions introduced 
in Chapter II. 

Let M be a resettable machine. The reachable part of M , 
denoted by P, is the set 

P - {6(p(r),x)|r € R, x e I*} . 

M is reachable if P = Q. M is l -reachable if 

P = {a(p(r),x)|r e R, x e I* and |x| < £} . 

Let M, M* € 911(1, Z,R). M is equivalent to M* (written M s M') 
if $ r = 0J. for all r e R. Two states q e Q and q r € Q' are 
equivalent (q - q') if /3^ = /T It is easily verified that these are 
both equivalence relations, the first on 311(1, Z,R) and the second on 
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the states of .machines in 511(1, Z,R). M is reduced if for all 
q,q f e P, q = q’ implies q = q\ 

If M and M’ are two resettable machines then M realizes M' if 
there is a triple of functions (o^o^crg) where ■ (I’) + — > I + is a 
semigroup homomorphism such that a^{V) c I, <7 : R’ R, 
a 3 ; z ” ~ ^ zf where Z" c z, such that for all r* e R’ p , = 

®3 ” ^ 2 (r’) ° °1- 

The following result is analogous to the result due to Leake [23] 
which was cited in Section 2. 2. It supplies an alternative, 
and structurally oriented, definition of realization. 

+ 

Theorem A. 1 . Let M and M T be two resettable machines with reach- 
able parts P and P . M realizes M T if and only if there exists a 
4-tuple of functions faj, n 2f n 3> n 4 ) where 

v ■ 

r? 2 ‘ R r — > R 
n 3 : Z -> Z' 

V P'-^^(P) - 0 (^(P) ={x|xc p}) 

such that 

i) 5(n 4 (p0,n 1 (a)) c n 4 (6 , (p , ,a )} for all p’ € P* and a e V 

ii) r? 3 (A(p, r ?1 (a))) = V(p',a) for all p r e P r , a e I f , and p e v A (p') 

iii) p<r? 2 (r T )) e *7 4 (p , (r')) for all r r e R f . 
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Proof : (Necessity) Assume that M realizes M\ Then there exists 

an appropriate triple of functions (or^, o^, cr^) such that /^(x) = 

a 3^a ( r »)( a i^ x ^)- Therefore 
2 

f3 p'(r') ( “ v) = Cr 3 ( ' 3 p(<7 2 (r’)) (a l (uv))) 

for each r’ e R‘, u e (I’)* and v e (I’) + . Hence, 


3>V(r>),u) (v) " <7 3 (/3 6(p((J 2 (r')),<T 1 ( u )) (CT l (v))) • 


Thus for each p 1 e P f there is a p e p such that 

Pp.(v) = o- 3 (^ p (cr 1 *( v ))) . 

Consider P' £p{ p) - </> defined by 

n 4 (p f ) = {p c P|/3p T = * 3 ° 0 p ° ^ } 
and consider riy I -> I defined by 

r?j(a) = OjCa) . 

Claim: The 4-tuple where cr^ is an arbitrary extension 

of Og to Z satisfies i), ii), and iii). 

i) Let p e n^(p'). We must show 6(p, n^(a)) e rj^(6 T (p', a)). 
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^•(p-,a) (x) = p p' (xa) 

= a 3 (/3 (a A (x a))) 

■ a 3 W 8(p,a 1 (a)) <ff l W)) 

= a 3^8fc, 1 W) ( 'l W,) ' 

Hence, 5(p, 17 ^(a)) e n 4 (5’(p\a)). 

ii) Let p e r? 4 (p f ). We must show 

a^(X(p, n^Ca))) = A,'(p',a). 

A'(p’,a) = /3' ,(a) 

r 

= cr 3 (/3 p (rj 1 (a.))) 

= a 3 (X(p, njta))) . 

iii) Let r r e R\ We must show pfa^r’)) e t? 4 <p T (r*')). 

0J.,(x) = a 3 (/3 a ( r f )^ <J 1 (x))) 

2 * 

implies 

pfagfr')) e ^ 4 (p ' (r t ) ) . 
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(Sufficiency)' Suppose there exists functions (n^, r?^» n-j, n 4 ) as in the 
statement of the theorem. Leto^: (I) -> I be the natural exten- 
sion of to sequences, f’hat is, cr^a^. . . a n ) = r?^(a^). . . ^(a^). 
Claim: M realizes M’ under (o^, ri^, n^). Consider ?: P T — > P 
where 

$ (p*) = some p t r? 4 (p’) such that 
= K (p f ( r’)) for all r* € R f . 

Let x = ya where aU. Then 

' , 3 ((3 n 2 (r') (cr l (x))) = ' , 3 < ' 3 p(„ 2 (r’)) (<T l W>) 

* V' 3 ? (p’(r')) (<T l W)) 

= h 3 (X(6(? (p'(r T )),a 1 (y)),o 1 (a))) 

= n 3 (X(p,a 1 (a))) where p € n 4 (5'(p T (r r ), y)) 

= A , (6 , (p f (r , ),y),a) 

* <3 p'(r') (ya) 

= /3j.,(x) 

This completes the proof of Theorem A. 1. 
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The next theorem states that if two reachable states of a state 
machine M mimic the same state of another state machine M’, then 
for any given input the states that they go to under the transition 
function 6 also mimic the same state of M\ 


Theorem A. 2; Let M be a state machine which realizes a state 
machine M’ under (or^crg.crj) where is onto. Then for all q 1 , 
q 2 e P and a e I, = Ogfag) im P lies <r 3 ( 6 (q r a)) = <7 3 (6(q 2 , a)). 


Proof: Let q^ q 2 e P and assume that - Ogfag)- Say that q^ = 

6(p(r 1 ),u 1 ) and q 2 = a(p(r 2 ),u 2 ). Since M realizes M\ for all 
r T e R', Og 0 jig , 0 a P' r f Since M and M T are state machines, 

for all r' e R* and x' e (I')*, 

c? 3 (6(p(cr 2 (r T )), cr 1 (x T ))) = 6' (p’(r'), x’) . 

Let a e I and denote or (x’> by x and cr^r*) by r. Then 
a 3 (6(q i> a)) = (6 (p(r 1 ), u^)) 

= fi’Cp’try, u^a') 


= 5 T (6'(p , (r t 1 ),u' l ),a t ) 

= ©’(erg (6(p Cr j), u 1 )), a’) 

= 6 '( a 3 (q 1 )»a f ) 

Likewise, o 3 (5(q 2 ,a) = 6 f (cr 3 (q 2 >, a’). Since CFg(q 1 ) = cr 3 (q 2 ) it now 
follows immediately that a 3 (6(q^,a)) = o 3 (6(q 2 , a)). 
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Theorem A. 3 : If M realizes M’ and M’ is reduced and reachable then 

|q| >|q' |. 

Proof : Assume that M realizes M* under (a^o^org) and that M’ 

is reduced and reachable. Then /3 r = ct 0 « Q , . o a f or all r e R‘ 

r 3 o 2 (r) 1 

Let q r e Q\ Then there exists r e R’ and x e ft*)* such that 
q 1 = S'O^'CrJ.x). Now 


^q ,(y) ~ p 6 , (p , (r),x) (y) 


= %(xy) 


= a 3 ( % 2 (r) (CT l (x y ))) 


°3 ^6 (p (ct 2 (r )), 0 ' 1 (x)) ^ 1 ^ ^ 


Hence there exists a function f from Q' into Q such that for each q f e Q, 


° a. 


^ =<J 3° %')-V 

To prove that |Q | > |q* | , it suffices to show that f is 1-1. Let 

qi>q 2 € and as sume that f(qJ = f(q 0 ). Then 0' =o„ o 8,. .on = 

1 * 12 q x 3 ^fiqj) 1 

a 3 ° ^f(q 2 ) ° CT l = Since M' is reduced and reachable this implies 
that q^ = q 2> Hence f is 1-1. This establishes the result. 


Theorem A. 4 : The relation "realizes 1 ' is transitive. That is, M realizes M' 

and M' realizes M” implies M realizes M". 
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Proof: (Sketch) Assume that M realizes M r under (ct^ct 2 ^ 3 ) and 

that M* realizes M” under (a' a' <xl). *’hen j3 ? r = cr 0 ° j 3 , ,> <» or. 

12 3 'r' 3 'erg (r ) 1 

for all r* c R' and pr„ = o£ 0 /T , ^ o < 7 ^ for all r" e R". It follows 

2 

that /3” l( = a 3 0 a 3 0 ( r ir^ a « a^. That is, M realizes 

2 2 

M n under (c^ ° cr^, a 2 » a^, • CTg). 


If M and M r are resettable machines then M is isomorphic to M* 
if there exist four 1-1 and onto functions 


w 1 : I->I* 
a > 2 : R — > R f 
ctJg : Z Z f 



P P* 


such that for all r e R, a e I, and qt P 

i) ^ 4 (6(q,a)) = a , (a> 4 (q), w^a)) 

ii) o> 3 (A(q,a)) = A r (co 4 (q), (a)) 

iii) « 4 (p(r)) = P f (w 2 (r)) . 


The 4-tuple (a>^, a> 2 , a>g, o> 4 ) is called an isomorphism of M onto M f . 

If M, M f e 911(1, Z,R) and (e,e,e,o> 4 ) is an isomorphism of M onto M’ f 
then M is strongly isomorphic to M\ A basic result of sequential 
machine theory states that for every machine there is an equivalent 
reduced machine and that this machine is unique up to strong 
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isomorphism. The corresponding result for resettable machines is 
given by Theorem A. 5 and Corollary A. 6. 1. 

Theorem A.5 : For every resettable machine M there is a reduced 

and reachable machine M R equivalent to M. 

Proof : Let M = (I, Q, Z, 5, A, R,p) and let M R = (I, Q R , Z, 5 R , A^, R,p R ) 

where 

Q r = {[q]|q e p } ([q] ={q’ |q* - q}) 

5 R ([q],a) = [5(q, a)] 

^([q^a) = A(q,a) 
p R (r) = [p(r)j 

To prove this result we must verify (1) that 5_ and 

K 

defined, (2) that M_. is reduced and reachable, and (3) that M = M„ 

rv xi 

The details of this proof are very similar to the details of the 
corresponding result in sequential machine theory. They may be 
found in many textbooks which cover this theory (e. g. , see Arbib 
[ 2 ]). 

M r as defined above is called the reduction of M. M' is a 
reduced form of M if M r is reduced and M = M\ 

Lemma A. 1: M = M r implies x) = x) for all r € R 


Ap are wetl- 


and x e I*. 
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Proof : Let a € I, x, y e I* and r e R. Then 
M s M’ ==^ P r (xya) = ff(xya) 

A(6(p{r),xy), a) = A’(6’(p f (r),xy), a) 

=> A(6(6(p(r),x), y), a) = A'^’^’MiO.x^y), a) 

^ ^(p(r),x) W = '‘e'lp'W.x )<y a) - 

Theorem A. 6 : If M and M f are both reduced and M = M* then M 

is strongly isomorphic to M*. 

Proof : Assume that M and M* are reduced and that M s M\ We 

know that each q e P is representable in the form S(p(r),x). Define 
o)^ : P — > P r by 

W4 (S(p (r), x)) = 5'(p'{r), x) ► 

Claim: M is strongly isomorphic to M* under (e, e,e,u> 4 ). We must 
show that is well-defined, 1-1 and onto and that for all r e R, 
a e I and q e P 

i) a> 4 (6(q,a)) = 6’(a; 4 (q), a) 

ii) A(q, a) = A’(a> 4 (q),a) 


iii) o> 4 (p(r)) = p’(r) . 
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In the following a> 4 (q) is denoted by q\ 

Well-defined : Let p = 6(p(r),x) and q = 6(p (s),y), and suppose that 

p = q. Then 0 o(p(r)iX) - /3 fi(p(s)>y) and thus by Lemma A. 1, ^ V (r),x) 
y)' ^hat = Since M’ is reduced and p 1 , q’ g P f it 

follows that p* = q\ Hence is well-defined. 

1-1 - Again let p = 6(p( r),x) and q = <5(p(s),y) but now suppose that 
p / q. Then by reapplying the above arguement p ? £ q\ Hence, 
o> 4 is 1-1. 

Onto : Since every q* e P* is representable in the form 6 f (p r (r),x) 
u> 4 is onto. 

That i), ii), and iii) are satisfied is straightforward to verify. 

Corollary A. 6. 1: The reduced form of M is unique up to strong 
isomorphism. That is, if M r and M" are reduced forms of M then 
M' is strongly isomorphic to M", 

Proof: If M’ and M" are reduced forms of M then M = M 1 and 

M = M”. Hence M T = M M . Since M* and M" are both reduced, by 
Theorem A. 6, M’ is strongly isomorphic to M". 

Theorem A. 7 : If M = M f then M realizes M*. 

Proof: M s M' implies j3.. = for all r g R. Hence M realizes M' 

r r 


under (e, e, e). 
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A resettable machine M Is autonomous If |l[ =1. 

Given a resettable machine M, two input symbols a, b e I are 
equivalent (a = b) if A(q, a) = A(q, b) and S(q,a) - 6(q,b) for all q e P. 
M is transition distinct if no two of its input symbols are equivalent. 
Any machine which has equivalent inputs is redundant in the sense 
that the inputs in an equivalence class can be represented by any one 
of its members without affecting the capabilities of the machine. The 
following result gives an alternative characterization of equivalent 
inputs. 

Theorem A. 8 : Let M be a resettable machine, and let a, b e I. Then 
a = b if and only if for all x, y e I* and r e R,0 r (xay) = 0 r (xby). 

Proof : (Necessity) Suppose a = b and assume, to the contrary, 

that 0 f (x ay) £ 0 r (xby) for some r e R andx,y e I*. Let q = 5(p(r),x). 

Now, 0 (xay) / 0 (x by) implies 0 (ay) / 0 (by). If y = A then 
r r Q ^ 

A(q, a) £ A(q, b). If y e I* then 0^ a j(y) ^ 0 g ^ b )(y) and hence 
6(q,a) f 5(q,b). Therefore a 4 b. Contradiction. Hence a - b 
implies 0 r (xay) = 0 r (xby) for all x,y e I* and r e R. 

(Sufficiency) Assume that a f b. Then for some q e P, A(q, a) / 
A(q,b)or 6(q,a) f' 5(q,b). Let q = 6(p(r),x). Then A(6(p(r), x), a) =/ 
A(6(p(r),x),b) or 5(p(r),xa) p 6(p(r), xb). Hence 0 r (xa) ^ 0 f (xb) or 
for some y e I + , 0 r (xay) / 0 r (xby). Therefore if 0 r (x ay) = 0 f (xby) 


for all r e R, and x, y € I* then a = b. 
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